[File] [PATCH] of Magdir/fonts,msdos,archive,windows,images for DOS code pages; *.cpx *.cpi
Christos Zoulas
christos at zoulas.com
Fri Jul 17 19:26:43 UTC 2020
Committed, thanks!
christos
> On Jul 17, 2020, at 9:54 AM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
> Hello,
> some days ago i handled some DOS code pages with file name extension
> cpi and cpx. When running file command version 5.39 on such examples,
> other files with cpi extension and related files i get an output like:
>
> 12520850.CPX: ASCII text, with CRLF line terminators
> DEVLOAD.COM: FREE-DOS executable (COM), UPX compressed
> EGA.CP_: Personal NetWare Packed File, was "EGA.CPI"
> ega.cpi: data
> ega10.cpi: DOS code page font data collection
> ega10.cpx: FREE-DOS executable (COM), UPX compressed
> ega18.cpx: FREE-DOS executable (COM), UPX compressed
> FaxTest.cpi: Cartesian Perceptual Compression image
> GEM.CPI: data
> Gilman2.cpc: Cartesian Perceptual Compression image
> Packed File.txt: Personal NetWare Packed File, was "by Novell. C"
> TICKLE.COM: FREE-DOS executable (COM), UPX compressed
>
> With --extension option in most cases only ??? is displayed and for
> FREE-DOS UPX compressed code pages wrong extension com instead
> correct cpx is displayed.
> Furthermore with -i option for many samples only generic
> application/octet-stream is shown.
>
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). This list the used
> file name extension and often with -v option the related URL
> pointing to used file format information.
>
> Examples like ega10.cpi are recognized by line inside Magdir/fonts
> 0 belong 0xff464f4e DOS code page font data collection
> Luckily TrID tool displays file name extension cpi and related URL.
> This is now expressed by additional comment line like
> # URL: http://fileformats.archiveteam.org/wiki/CPI
> More information about DOS code page file format can be found in Ralf
> Brown's list #01758. This is now expressed by additional comment line
> like:
> # Ref.: http://www.delorie.com/djgpp/doc/rbinter/it/58/17.html
> So show now file name extension and a user defined mime type by
> additional lines like:
> !:mime font/x-dos-cpi
> !:ext cpi
> The described format is used in Microsoft DOS and in older versions
> of FreeDOS ( cpidos package).
> Luckily on web site also DR-DOS variant is mentioned. So identifies
> such samples like EGA.CPI or GEM.CPI by additional lines like
> 0 string \x7fDRFONT DR-DOS code page font data collection
> !:mime font/x-drdos-cpi
> !:ext cpi
>
> In newer FreeDos versions the code pages are compressed with UPX. So
> such samples are described by line fragments inside /Magdir/msdos like:
> 34 string UPX! FREE-DOS executable (COM), UPX compressed
> 35 string UPX! FREE-DOS executable (COM), UPX compressed
>
> Some information about UPX can be found on Wikipedia. This is now
> expressed by comment line like
> # URL: https://en.wikipedia.org/wiki/UPX
> When running upx with list option the used format "dos/com" and file
> sizes are shown. This "dos/com" file format can be understood by
> looking in assembler source of UPX module. This is expressed by line
> like:
> # Reference: github.com/upx/upx/archive/v3.96.zip/upx-3.96/
> # src/stub/src/i086-dos16.com.S
> First assembler instruction is "cmp sp, offset sp_limit". That is
> expressed by magic line
> 0 string/b \x81\xfc
> Next assembler instructions are "jump above +2; int 0x20; mov cx,
> offset bytes_to_copy". That is expressed by second test line:
>> 4 string \x77\x02\xcd\x20\xb9
> Third test line was
>>> 36 string UPX! FREE-DOS executable (COM), UPX compressed
> I modify this line. As third test i look for assembler instructions
> "push di; jump decomp_start_n2b". This is now expressed by line
>> 0x1e search/3 \x57\xe9
> This occurs at some different offsets, because sometimes additional
> instructions like a second "push di" appear. Afterwards i now look
> for UPX_MAGIC_LE32 according to included file header.S by line
>>> &2 string UPX! FREE-DOS executable (COM), UPX
> The size of the uncompressed file is stored. So show also this
> information by additional line like:
>>>> &12 uleshort x \b, uncompressed %u bytes
> So now all UPX variants are matched. TrID is able to distinguish
> between UPX executable and UPX compressed DOS code pages. When
> looking in trid definition file cpx-fdos.trid.xml i see
> characteristic phrase FONT. So now i do this two cases by additional
> lines like:
>>>> &21 string =FONT compressed DOS code page font
> !:ext cpx
>>>> &21 string !FONT compressed
> !:ext com
>
> On installation discs the DR-DOS files are packed and the last
> character of extension name is replaced by underscore. Such samples
> like EGA.CP_ are identified by line inside Magdir/archive like:
> 0 string Packed\ File\ Personal NetWare Packed File
> Information about this file format can be found in Matthias Paul tips
> about Novell DOS 7. This is now expressed by additional comment line
> like:
> # Ref. www.antonis.de/dos/dos-tuts/mpdostip/html/nwdostip.htm
> The original file name is stored after starting magic. That was
> expressed by line
>> 12 string x \b, was "%.12s"
> According to documentation these names are terminated by Control-Z
> character. Now i use this as additional test to skip misidentified
> ASCII texts starting with phrase Packed\040File\040. So magic lines
> now becomes like:
> 0 string Packed\ File\
>> 0x18 ubyte 0x1a Personal NetWare Packed File
> A user defined mime type and file name extension are now shown by
> additional lines like:
> !:mime application/x-novell-compress
> !:ext ??_
> The size of the uncompressed file is stored some bytes later. So show
> also this information by additional line like:
>>> 0x1b ulelong x with %u bytes
>
> Examples like 12520850.CPX are found in sub directory SysWOW64 on
> Windows systems. The TrID file identifier describes such CPX files
> generic as INI configuration and especially as Windows code page
> translator, because it starts with a line like
> [Windows Latin 1(1252)/850 (Multilingual-Latin 1)]
> So according to Gary Kessler file signature add in sub routine
> ini-file inside Magdir/windows an additional identifying line like
>>> &0 regex/c \^(Windows\ Latin) Windows codepage translator
> Afterward show mime type and file name extension by additional lines
> !:mime text/plain
> !:ext cpx
>
> Examples like FaxTest.cpi are recognized as Cartesian Perceptual
> Compression image by Magdir/images via magic line like
> 0 string CPC\262 Cartesian Perceptual Compression image
> To display file name extensions i add according to file formats
> archive team web site afterwards a line like
> !:ext cpi/cpc
>
> After applying the above mentioned modifications by patches
> file-5.39-fonts-cpi.diff, file-5.39-msdos-cpx.diff,
> file-5.39-archive-novell.diff, file-5.39-windows-cpx.diff and
> file-5.39-images-cpi.diff then i get a more precise output like:
>
> 12520850.CPX: Windows codepage translator
> DEVLOAD.COM: FREE-DOS executable (COM), UPX compressed
> , uncompressed 5514 bytes
> EGA.CP_: Personal NetWare Packed File, was "EGA.CPI"
> with 24888 bytes
> ega.cpi: DR-DOS code page font data collection
> ega10.cpi: DOS code page font data collection
> ega10.cpx: FREE-DOS executable (COM), UPX compressed
> DOS code page font, uncompressed 58880 bytes
> ega18.cpx: FREE-DOS executable (COM), UPX compressed
> DOS code page font, uncompressed 29540 bytes
> FaxTest.cpi: Cartesian Perceptual Compression image
> GEM.CPI: DR-DOS code page font data collection
> Gilman2.cpc: Cartesian Perceptual Compression image
> Packed File.txt: ASCII text, with CRLF line terminators
> TICKLE.COM: FREE-DOS executable (COM), UPX compressed
> , uncompressed 2658 bytes
>
> I hope my diff files can be applied in future version of
> file utility.
>
> With best wishes
> Jörg Jenderek
> - --
> Jörg Jenderek
>
>
>
>
>
>
>
>
>
>
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCXxGtigAKCRCv8rHJQhrU
> 1pOsAJwMNw28J5fkI1T3jFA3gcHldzWWAwCfTcWyrHvyRsmK/65P7fxpWgG6hZU=
> =rhF8
> -----END PGP SIGNATURE-----
> <file-5_39-windows-cpx_diff.DEFANGED-261><file-5_39-images-cpi_diff.DEFANGED-262><file-5_39-fonts-cpi_diff.DEFANGED-263><file-5_39-msdos-cpx_diff.DEFANGED-264><file-5_39-archive-novell_diff.DEFANGED-265><file-5_39-windows-cpx_diff_sig.DEFANGED-266><file-5_39-images-cpi_diff_sig.DEFANGED-267><file-5_39-fonts-cpi_diff_sig.DEFANGED-268><file-5_39-msdos-cpx_diff_sig.DEFANGED-269><file-5_39-archive-novell_diff_sig.DEFANGED-270>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20200717/5fe34ce8/attachment-0001.asc>
More information about the File
mailing list