[File] [PATCH] of Magdir/fonts,msdos,archive,windows,images for DOS code pages; *.cpx *.cpi

Christos Zoulas christos at zoulas.com
Fri Jul 17 19:26:43 UTC 2020


Committed, thanks!

christos

> On Jul 17, 2020, at 9:54 AM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> 
> Hello,
> some days ago i handled some DOS code pages with file name extension
> cpi and cpx. When running file command version 5.39 on such examples,
> other files with cpi extension and related files i get an output like:
> 
> 12520850.CPX:    ASCII text, with CRLF line terminators
> DEVLOAD.COM:     FREE-DOS executable (COM), UPX compressed
> EGA.CP_:         Personal NetWare Packed File, was "EGA.CPI"
> ega.cpi:         data
> ega10.cpi:       DOS code page font data collection
> ega10.cpx:       FREE-DOS executable (COM), UPX compressed
> ega18.cpx:       FREE-DOS executable (COM), UPX compressed
> FaxTest.cpi:     Cartesian Perceptual Compression image
> GEM.CPI:         data
> Gilman2.cpc:     Cartesian Perceptual Compression image
> Packed File.txt: Personal NetWare Packed File, was "by Novell. C"
> TICKLE.COM:      FREE-DOS executable (COM), UPX compressed
> 
> With --extension option in most cases only ??? is displayed and for
> FREE-DOS UPX compressed code pages wrong extension com instead
> correct cpx is displayed.
> Furthermore with -i option for many samples only generic
> application/octet-stream is shown.
> 
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). This list the used
> file name extension and often with -v option the related URL
> pointing to used file format information.
> 
> Examples like ega10.cpi are recognized by line inside Magdir/fonts
> 0 belong	0xff464f4e	DOS code page font data collection
> Luckily TrID tool displays file name extension cpi and related URL.
> This is now expressed by additional comment line like
> # URL:		http://fileformats.archiveteam.org/wiki/CPI
> More information about DOS code page file format can be found in Ralf
> Brown's list #01758. This is now expressed by additional comment line
> like:
> # Ref.: http://www.delorie.com/djgpp/doc/rbinter/it/58/17.html
> So show now file name extension and a user defined mime type by
> additional lines like:
> !:mime	font/x-dos-cpi
> !:ext	cpi
> The described format is used in Microsoft DOS and in older versions
> of FreeDOS ( cpidos package).
> Luckily on web site also DR-DOS variant is mentioned. So identifies
> such samples like EGA.CPI or GEM.CPI by additional lines like
> 0 string \x7fDRFONT	DR-DOS code page font data collection
> !:mime	font/x-drdos-cpi
> !:ext	cpi
> 
> In newer FreeDos versions the code pages are compressed with UPX. So
> such samples are described by line fragments inside /Magdir/msdos like:
> 34	string	UPX!	FREE-DOS executable (COM), UPX compressed
> 35	string	UPX!	FREE-DOS executable (COM), UPX compressed
> 
> Some information about UPX can be found on Wikipedia. This is now
> expressed by comment line like
> # URL:		https://en.wikipedia.org/wiki/UPX
> When running upx with list option the used format "dos/com" and file
> sizes are shown. This "dos/com" file format can be understood by
> looking in assembler source of UPX module. This is expressed by line
> like:
> # Reference:	github.com/upx/upx/archive/v3.96.zip/upx-3.96/
> #		src/stub/src/i086-dos16.com.S
> First assembler instruction is "cmp sp, offset sp_limit". That is
> expressed by magic line
> 0	string/b	\x81\xfc
> Next assembler instructions are "jump above +2; int 0x20; mov cx,
> offset bytes_to_copy". That is expressed by second test line:
>> 4	string	\x77\x02\xcd\x20\xb9
> Third test line was
>>> 36	string	UPX!	FREE-DOS executable (COM), UPX compressed
> I modify this line. As third test i look for assembler instructions
> "push di; jump decomp_start_n2b". This is now expressed by line
>> 0x1e	search/3	\x57\xe9
> This occurs at some different offsets, because sometimes additional
> instructions like a second "push di" appear. Afterwards i now look
> for UPX_MAGIC_LE32 according to included file header.S by line
>>> &2	string		UPX!	FREE-DOS executable (COM), UPX
> The size of the uncompressed file is stored. So show also this
> information by additional line like:
>>>> &12	uleshort	x		\b, uncompressed %u bytes
> So now all UPX variants are matched. TrID is able to distinguish
> between UPX executable and UPX compressed DOS code pages. When
> looking in trid definition file cpx-fdos.trid.xml i see
> characteristic phrase FONT. So now i do this two cases by additional
> lines like:
>>>> &21	string		=FONT		compressed DOS code page font
> !:ext	cpx
>>>> &21	string		!FONT		compressed
> !:ext	com
> 
> On installation discs the DR-DOS files are packed and the last
> character of extension name is replaced by underscore. Such samples
> like EGA.CP_ are identified by line inside Magdir/archive like:
> 0	string	Packed\ File\ 	Personal NetWare Packed File
> Information about this file format can be found in Matthias Paul tips
> about Novell DOS 7. This is now expressed by additional comment line
> like:
> # Ref. www.antonis.de/dos/dos-tuts/mpdostip/html/nwdostip.htm
> The original file name is stored after starting magic. That was
> expressed by line
>> 12	string	x		\b, was "%.12s"
> According to documentation these names are terminated by Control-Z
> character. Now i use this as additional test to skip misidentified
> ASCII texts starting with phrase Packed\040File\040. So magic lines
> now becomes like:
> 0	string	Packed\ File\
>> 0x18	ubyte	0x1a		Personal NetWare Packed File
> A user defined mime type and file name extension are now shown by
> additional lines like:
> !:mime	application/x-novell-compress
> !:ext	??_
> The size of the uncompressed file is stored some bytes later. So show
> also this information by additional line like:
>>> 0x1b	ulelong	x		with %u bytes
> 
> Examples like 12520850.CPX are found in sub directory SysWOW64 on
> Windows systems. The TrID file identifier describes such CPX files
> generic as INI configuration and especially as Windows code page
> translator, because it starts with a line like
> [Windows Latin 1(1252)/850 (Multilingual-Latin 1)]
> So according to Gary Kessler file signature add in sub routine
> ini-file inside Magdir/windows an additional identifying line like
>>> &0	regex/c	\^(Windows\ Latin)	Windows codepage translator
> Afterward show mime type and file name extension by additional lines
> !:mime	text/plain
> !:ext	cpx
> 
> Examples like FaxTest.cpi are recognized as Cartesian Perceptual
> Compression image by Magdir/images via magic line like
> 0 string CPC\262 Cartesian Perceptual Compression  image
> To display file name extensions i add according to file formats
> archive team web site afterwards a line like
> !:ext	cpi/cpc
> 
> After applying the above mentioned modifications by patches
> file-5.39-fonts-cpi.diff, file-5.39-msdos-cpx.diff,
> file-5.39-archive-novell.diff, file-5.39-windows-cpx.diff and
> file-5.39-images-cpi.diff then i get a more precise output like:
> 
> 12520850.CPX:    Windows codepage translator
> DEVLOAD.COM:     FREE-DOS executable (COM), UPX compressed
> 		 , uncompressed 5514 bytes
> EGA.CP_:         Personal NetWare Packed File, was "EGA.CPI"
> 		 with 24888 bytes
> ega.cpi:         DR-DOS code page font data collection
> ega10.cpi:       DOS code page font data collection
> ega10.cpx:       FREE-DOS executable (COM), UPX compressed
> 		 DOS code page font, uncompressed 58880 bytes
> ega18.cpx:       FREE-DOS executable (COM), UPX compressed
> 		 DOS code page font, uncompressed 29540 bytes
> FaxTest.cpi:     Cartesian Perceptual Compression image
> GEM.CPI:         DR-DOS code page font data collection
> Gilman2.cpc:     Cartesian Perceptual Compression image
> Packed File.txt: ASCII text, with CRLF line terminators
> TICKLE.COM:      FREE-DOS executable (COM), UPX compressed
> 		 , uncompressed 2658 bytes
> 
> I hope my diff files can be applied in future version of
> file utility.
> 
> With best wishes
> Jörg Jenderek
> - --
> Jörg Jenderek
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
> 
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCXxGtigAKCRCv8rHJQhrU
> 1pOsAJwMNw28J5fkI1T3jFA3gcHldzWWAwCfTcWyrHvyRsmK/65P7fxpWgG6hZU=
> =rhF8
> -----END PGP SIGNATURE-----
> <file-5_39-windows-cpx_diff.DEFANGED-261><file-5_39-images-cpi_diff.DEFANGED-262><file-5_39-fonts-cpi_diff.DEFANGED-263><file-5_39-msdos-cpx_diff.DEFANGED-264><file-5_39-archive-novell_diff.DEFANGED-265><file-5_39-windows-cpx_diff_sig.DEFANGED-266><file-5_39-images-cpi_diff_sig.DEFANGED-267><file-5_39-fonts-cpi_diff_sig.DEFANGED-268><file-5_39-msdos-cpx_diff_sig.DEFANGED-269><file-5_39-archive-novell_diff_sig.DEFANGED-270>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20200717/5fe34ce8/attachment-0001.asc>


More information about the File mailing list