[File] [PATCH] of Magdir/fonts,msdos,archive,windows,images for DOS code pages; *.cpx *.cpi
Jörg Jenderek
joerg.jen.der.ek at gmx.net
Fri Jul 17 13:54:54 UTC 2020
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello,
some days ago i handled some DOS code pages with file name extension
cpi and cpx. When running file command version 5.39 on such examples,
other files with cpi extension and related files i get an output like:
12520850.CPX: ASCII text, with CRLF line terminators
DEVLOAD.COM: FREE-DOS executable (COM), UPX compressed
EGA.CP_: Personal NetWare Packed File, was "EGA.CPI"
ega.cpi: data
ega10.cpi: DOS code page font data collection
ega10.cpx: FREE-DOS executable (COM), UPX compressed
ega18.cpx: FREE-DOS executable (COM), UPX compressed
FaxTest.cpi: Cartesian Perceptual Compression image
GEM.CPI: data
Gilman2.cpc: Cartesian Perceptual Compression image
Packed File.txt: Personal NetWare Packed File, was "by Novell. C"
TICKLE.COM: FREE-DOS executable (COM), UPX compressed
With --extension option in most cases only ??? is displayed and for
FREE-DOS UPX compressed code pages wrong extension com instead
correct cpx is displayed.
Furthermore with -i option for many samples only generic
application/octet-stream is shown.
For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). This list the used
file name extension and often with -v option the related URL
pointing to used file format information.
Examples like ega10.cpi are recognized by line inside Magdir/fonts
0 belong 0xff464f4e DOS code page font data collection
Luckily TrID tool displays file name extension cpi and related URL.
This is now expressed by additional comment line like
# URL: http://fileformats.archiveteam.org/wiki/CPI
More information about DOS code page file format can be found in Ralf
Brown's list #01758. This is now expressed by additional comment line
like:
# Ref.: http://www.delorie.com/djgpp/doc/rbinter/it/58/17.html
So show now file name extension and a user defined mime type by
additional lines like:
!:mime font/x-dos-cpi
!:ext cpi
The described format is used in Microsoft DOS and in older versions
of FreeDOS ( cpidos package).
Luckily on web site also DR-DOS variant is mentioned. So identifies
such samples like EGA.CPI or GEM.CPI by additional lines like
0 string \x7fDRFONT DR-DOS code page font data collection
!:mime font/x-drdos-cpi
!:ext cpi
In newer FreeDos versions the code pages are compressed with UPX. So
such samples are described by line fragments inside /Magdir/msdos like:
34 string UPX! FREE-DOS executable (COM), UPX compressed
35 string UPX! FREE-DOS executable (COM), UPX compressed
Some information about UPX can be found on Wikipedia. This is now
expressed by comment line like
# URL: https://en.wikipedia.org/wiki/UPX
When running upx with list option the used format "dos/com" and file
sizes are shown. This "dos/com" file format can be understood by
looking in assembler source of UPX module. This is expressed by line
like:
# Reference: github.com/upx/upx/archive/v3.96.zip/upx-3.96/
# src/stub/src/i086-dos16.com.S
First assembler instruction is "cmp sp, offset sp_limit". That is
expressed by magic line
0 string/b \x81\xfc
Next assembler instructions are "jump above +2; int 0x20; mov cx,
offset bytes_to_copy". That is expressed by second test line:
>4 string \x77\x02\xcd\x20\xb9
Third test line was
>>36 string UPX! FREE-DOS executable (COM), UPX compressed
I modify this line. As third test i look for assembler instructions
"push di; jump decomp_start_n2b". This is now expressed by line
>0x1e search/3 \x57\xe9
This occurs at some different offsets, because sometimes additional
instructions like a second "push di" appear. Afterwards i now look
for UPX_MAGIC_LE32 according to included file header.S by line
>>&2 string UPX! FREE-DOS executable (COM), UPX
The size of the uncompressed file is stored. So show also this
information by additional line like:
>>>&12 uleshort x \b, uncompressed %u bytes
So now all UPX variants are matched. TrID is able to distinguish
between UPX executable and UPX compressed DOS code pages. When
looking in trid definition file cpx-fdos.trid.xml i see
characteristic phrase FONT. So now i do this two cases by additional
lines like:
>>>&21 string =FONT compressed DOS code page font
!:ext cpx
>>>&21 string !FONT compressed
!:ext com
On installation discs the DR-DOS files are packed and the last
character of extension name is replaced by underscore. Such samples
like EGA.CP_ are identified by line inside Magdir/archive like:
0 string Packed\ File\ Personal NetWare Packed File
Information about this file format can be found in Matthias Paul tips
about Novell DOS 7. This is now expressed by additional comment line
like:
# Ref. www.antonis.de/dos/dos-tuts/mpdostip/html/nwdostip.htm
The original file name is stored after starting magic. That was
expressed by line
>12 string x \b, was "%.12s"
According to documentation these names are terminated by Control-Z
character. Now i use this as additional test to skip misidentified
ASCII texts starting with phrase Packed\040File\040. So magic lines
now becomes like:
0 string Packed\ File\
>0x18 ubyte 0x1a Personal NetWare Packed File
A user defined mime type and file name extension are now shown by
additional lines like:
!:mime application/x-novell-compress
!:ext ??_
The size of the uncompressed file is stored some bytes later. So show
also this information by additional line like:
>>0x1b ulelong x with %u bytes
Examples like 12520850.CPX are found in sub directory SysWOW64 on
Windows systems. The TrID file identifier describes such CPX files
generic as INI configuration and especially as Windows code page
translator, because it starts with a line like
[Windows Latin 1(1252)/850 (Multilingual-Latin 1)]
So according to Gary Kessler file signature add in sub routine
ini-file inside Magdir/windows an additional identifying line like
>>&0 regex/c \^(Windows\ Latin) Windows codepage translator
Afterward show mime type and file name extension by additional lines
!:mime text/plain
!:ext cpx
Examples like FaxTest.cpi are recognized as Cartesian Perceptual
Compression image by Magdir/images via magic line like
0 string CPC\262 Cartesian Perceptual Compression image
To display file name extensions i add according to file formats
archive team web site afterwards a line like
!:ext cpi/cpc
After applying the above mentioned modifications by patches
file-5.39-fonts-cpi.diff, file-5.39-msdos-cpx.diff,
file-5.39-archive-novell.diff, file-5.39-windows-cpx.diff and
file-5.39-images-cpi.diff then i get a more precise output like:
12520850.CPX: Windows codepage translator
DEVLOAD.COM: FREE-DOS executable (COM), UPX compressed
, uncompressed 5514 bytes
EGA.CP_: Personal NetWare Packed File, was "EGA.CPI"
with 24888 bytes
ega.cpi: DR-DOS code page font data collection
ega10.cpi: DOS code page font data collection
ega10.cpx: FREE-DOS executable (COM), UPX compressed
DOS code page font, uncompressed 58880 bytes
ega18.cpx: FREE-DOS executable (COM), UPX compressed
DOS code page font, uncompressed 29540 bytes
FaxTest.cpi: Cartesian Perceptual Compression image
GEM.CPI: DR-DOS code page font data collection
Gilman2.cpc: Cartesian Perceptual Compression image
Packed File.txt: ASCII text, with CRLF line terminators
TICKLE.COM: FREE-DOS executable (COM), UPX compressed
, uncompressed 2658 bytes
I hope my diff files can be applied in future version of
file utility.
With best wishes
Jörg Jenderek
- --
Jörg Jenderek
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCXxGtigAKCRCv8rHJQhrU
1pOsAJwMNw28J5fkI1T3jFA3gcHldzWWAwCfTcWyrHvyRsmK/65P7fxpWgG6hZU=
=rhF8
-----END PGP SIGNATURE-----
-------------- next part --------------
--- file-5.39/magic/Magdir/windows.old 2020-05-31 10:34:41 +0000
+++ file-5.39/magic/Magdir/windows 2020-07-14 20:51:01 +0000
@@ -577,4 +577,11 @@
!:mime text/plain
!:ext hhp
+# From: Joerg Jenderek
+# URL: https://documentation.basis.com/BASISHelp/WebHelp/b3odbc/obdcdriv_character_translation.htm
+# Reference: https://www.garykessler.net/library/file_sigs.html
+# Note: stored in directory %WINDIR%\SysWOW64 or %WINDIR%\system
+>>&0 regex/c \^(Windows\ Latin) Windows codepage translator
+!:mime text/plain
+!:ext cpx
# unknown keyword after opening bracket
>>&0 default x
-------------- next part --------------
--- file-5.39/magic/Magdir/images.old 2020-05-31 10:34:40 +0000
+++ file-5.39/magic/Magdir/images 2020-07-15 11:50:26 +0000
@@ -1413,4 +1413,6 @@
# https://www.cartesianinc.com/Tech/
+# Reference: http://fileformats.archiveteam.org/wiki/Cartesian_Perceptual_Compression
0 string CPC\262 Cartesian Perceptual Compression image
!:mime image/x-cpi
+!:ext cpi/cpc
-------------- next part --------------
--- file-5.39/magic/Magdir/fonts.old 2020-05-31 10:34:40 +0000
+++ file-5.39/magic/Magdir/fonts 2020-07-15 13:42:34 +0000
@@ -125,3 +125,11 @@
# Misc. DOS VGA fonts, from Albert Cahalan (acahalan at cs.uml.edu)
+# Update: Joerg Jenderek
+# URL: http://fileformats.archiveteam.org/wiki/CPI
+# Reference: http://www.delorie.com/djgpp/doc/rbinter/it/58/17.html
0 belong 0xff464f4e DOS code page font data collection
+!:mime font/x-dos-cpi
+!:ext cpi
+0 string \x7fDRFONT DR-DOS code page font data collection
+!:mime font/x-drdos-cpi
+!:ext cpi
7 belong 0x00454741 DOS code page font data
-------------- next part --------------
--- file-5.39/magic/Magdir/msdos.old 2020-05-31 10:34:40 +0000
+++ file-5.39/magic/Magdir/msdos 2020-07-17 12:45:02 +0000
@@ -572,7 +572,27 @@
+# URL: https://en.wikipedia.org/wiki/UPX
+# Reference: https://github.com/upx/upx/archive/v3.96.zip/upx-3.96/
+# src/stub/src/i086-dos16.com.S
+# Update: Joerg Jenderek
+# assembler instructions: cmp sp, offset sp_limit
0 string/b \x81\xfc
+#>2 uleshort x \b, sp_limit=0x%x
+# assembler instructions: jump above +2; int 0x20; mov cx, offset bytes_to_copy
>4 string \x77\x02\xcd\x20\xb9
->>36 string UPX! FREE-DOS executable (COM), UPX compressed
-!:mime application/x-dosexec
-!:ext com
+#>9 uleshort x \b, [bytes_to_copy]=0x%x
+# at different offsets assembler instructions: push di; jump decomp_start_n2b
+>0x1e search/3 \x57\xe9
+#>>&0 uleshort x \b, decomp_start_n2b=0x%x
+# src/stub/src/include/header.S; UPX_MAGIC_LE32
+>>&2 string UPX! FREE-DOS executable (COM), UPX
+!:mime application/x-dosexec
+# UPX compressed *.CPI; See ./fonts
+>>>&21 string =FONT compressed DOS code page font
+!:ext cpx
+>>>&21 string !FONT compressed
+!:ext com
+# compressed size?
+#>>>&14 uleshort+152 x \b, %u bytes
+# uncompressed len
+>>>&12 uleshort x \b, uncompressed %u bytes
252 string Must\ have\ DOS\ version DR-DOS executable (COM)
@@ -580,11 +600,2 @@
!:ext com
-# added by Joerg Jenderek at Oct 2008
-# GRR search is not working
-#34 search/2 UPX! FREE-DOS executable (COM), UPX compressed
-34 string UPX! FREE-DOS executable (COM), UPX compressed
-!:mime application/x-dosexec
-!:ext com
-35 string UPX! FREE-DOS executable (COM), UPX compressed
-!:mime application/x-dosexec
-!:ext com
# GRR search is not working
-------------- next part --------------
--- file-5.39/magic/Magdir/archive.old 2020-06-15 00:01:01 +0000
+++ file-5.39/magic/Magdir/archive 2020-07-16 00:58:56 +0000
@@ -1473,6 +1473,15 @@
# DR-DOS 7.03 Packed File *.??_
-0 string Packed\ File\ Personal NetWare Packed File
->12 string x \b, was "%.12s"
+# Reference: http://www.antonis.de/dos/dos-tuts/mpdostip/html/nwdostip.htm
+# Note: unpacked by PNUNPACK.EXE
+0 string Packed\ File\
+# by looking for Control-Z skip ASCII text starting with Packed File
+>0x18 ubyte 0x1a Personal NetWare Packed File
+!:mime application/x-novell-compress
+!:ext ??_
+>>12 string x \b, was "%.12s"
+# 1 or 2
+#>>0x19 ubyte x \b, at 0x19 %u
+>>0x1b ulelong x with %u bytes
# EET archive
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.39-windows-cpx.diff.sig
Type: application/octet-stream
Size: 95 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20200717/3f6be075/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.39-images-cpi.diff.sig
Type: application/octet-stream
Size: 95 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20200717/3f6be075/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.39-fonts-cpi.diff.sig
Type: application/octet-stream
Size: 95 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20200717/3f6be075/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.39-msdos-cpx.diff.sig
Type: application/octet-stream
Size: 95 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20200717/3f6be075/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.39-archive-novell.diff.sig
Type: application/octet-stream
Size: 94 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20200717/3f6be075/attachment-0004.obj>
More information about the File
mailing list