[File] [PATCH] of Magdir/fonts,msdos,archive,windows,images for DOS code pages; *.cpx *.cpi

Jörg Jenderek joerg.jen.der.ek at gmx.net
Fri Jul 17 13:54:54 UTC 2020


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Hello,
some days ago i handled some DOS code pages with file name extension
cpi and cpx. When running file command version 5.39 on such examples,
other files with cpi extension and related files i get an output like:

12520850.CPX:    ASCII text, with CRLF line terminators
DEVLOAD.COM:     FREE-DOS executable (COM), UPX compressed
EGA.CP_:         Personal NetWare Packed File, was "EGA.CPI"
ega.cpi:         data
ega10.cpi:       DOS code page font data collection
ega10.cpx:       FREE-DOS executable (COM), UPX compressed
ega18.cpx:       FREE-DOS executable (COM), UPX compressed
FaxTest.cpi:     Cartesian Perceptual Compression image
GEM.CPI:         data
Gilman2.cpc:     Cartesian Perceptual Compression image
Packed File.txt: Personal NetWare Packed File, was "by Novell. C"
TICKLE.COM:      FREE-DOS executable (COM), UPX compressed

With --extension option in most cases only ??? is displayed and for
FREE-DOS UPX compressed code pages wrong extension com instead
correct cpx is displayed.
Furthermore with -i option for many samples only generic
application/octet-stream is shown.

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). This list the used
file name extension and often with -v option the related URL
pointing to used file format information.

Examples like ega10.cpi are recognized by line inside Magdir/fonts
 0 belong	0xff464f4e	DOS code page font data collection
Luckily TrID tool displays file name extension cpi and related URL.
This is now expressed by additional comment line like
 # URL:		http://fileformats.archiveteam.org/wiki/CPI
More information about DOS code page file format can be found in Ralf
Brown's list #01758. This is now expressed by additional comment line
like:
 # Ref.: http://www.delorie.com/djgpp/doc/rbinter/it/58/17.html
So show now file name extension and a user defined mime type by
additional lines like:
 !:mime	font/x-dos-cpi
 !:ext	cpi
The described format is used in Microsoft DOS and in older versions
of FreeDOS ( cpidos package).
Luckily on web site also DR-DOS variant is mentioned. So identifies
such samples like EGA.CPI or GEM.CPI by additional lines like
 0 string \x7fDRFONT	DR-DOS code page font data collection
 !:mime	font/x-drdos-cpi
 !:ext	cpi

In newer FreeDos versions the code pages are compressed with UPX. So
such samples are described by line fragments inside /Magdir/msdos like:
 34	string	UPX!	FREE-DOS executable (COM), UPX compressed
 35	string	UPX!	FREE-DOS executable (COM), UPX compressed

Some information about UPX can be found on Wikipedia. This is now
expressed by comment line like
 # URL:		https://en.wikipedia.org/wiki/UPX
When running upx with list option the used format "dos/com" and file
sizes are shown. This "dos/com" file format can be understood by
looking in assembler source of UPX module. This is expressed by line
like:
 # Reference:	github.com/upx/upx/archive/v3.96.zip/upx-3.96/
 #		src/stub/src/i086-dos16.com.S
First assembler instruction is "cmp sp, offset sp_limit". That is
expressed by magic line
 0	string/b	\x81\xfc
Next assembler instructions are "jump above +2; int 0x20; mov cx,
offset bytes_to_copy". That is expressed by second test line:
 >4	string	\x77\x02\xcd\x20\xb9
Third test line was
 >>36	string	UPX!	FREE-DOS executable (COM), UPX compressed
I modify this line. As third test i look for assembler instructions
"push di; jump decomp_start_n2b". This is now expressed by line
 >0x1e	search/3	\x57\xe9
This occurs at some different offsets, because sometimes additional
instructions like a second "push di" appear. Afterwards i now look
for UPX_MAGIC_LE32 according to included file header.S by line
 >>&2	string		UPX!	FREE-DOS executable (COM), UPX
The size of the uncompressed file is stored. So show also this
information by additional line like:
 >>>&12	uleshort	x		\b, uncompressed %u bytes
So now all UPX variants are matched. TrID is able to distinguish
between UPX executable and UPX compressed DOS code pages. When
looking in trid definition file cpx-fdos.trid.xml i see
characteristic phrase FONT. So now i do this two cases by additional
lines like:
 >>>&21	string		=FONT		compressed DOS code page font
 !:ext	cpx
 >>>&21	string		!FONT		compressed
 !:ext	com

On installation discs the DR-DOS files are packed and the last
character of extension name is replaced by underscore. Such samples
like EGA.CP_ are identified by line inside Magdir/archive like:
 0	string	Packed\ File\ 	Personal NetWare Packed File
Information about this file format can be found in Matthias Paul tips
about Novell DOS 7. This is now expressed by additional comment line
like:
 # Ref. www.antonis.de/dos/dos-tuts/mpdostip/html/nwdostip.htm
The original file name is stored after starting magic. That was
expressed by line
 >12	string	x		\b, was "%.12s"
According to documentation these names are terminated by Control-Z
character. Now i use this as additional test to skip misidentified
ASCII texts starting with phrase Packed\040File\040. So magic lines
now becomes like:
 0	string	Packed\ File\
 >0x18	ubyte	0x1a		Personal NetWare Packed File
A user defined mime type and file name extension are now shown by
additional lines like:
 !:mime	application/x-novell-compress
 !:ext	??_
The size of the uncompressed file is stored some bytes later. So show
also this information by additional line like:
 >>0x1b	ulelong	x		with %u bytes

Examples like 12520850.CPX are found in sub directory SysWOW64 on
Windows systems. The TrID file identifier describes such CPX files
generic as INI configuration and especially as Windows code page
translator, because it starts with a line like
[Windows Latin 1(1252)/850 (Multilingual-Latin 1)]
So according to Gary Kessler file signature add in sub routine
ini-file inside Magdir/windows an additional identifying line like
 >>&0	regex/c	\^(Windows\ Latin)	Windows codepage translator
Afterward show mime type and file name extension by additional lines
 !:mime	text/plain
 !:ext	cpx

Examples like FaxTest.cpi are recognized as Cartesian Perceptual
Compression image by Magdir/images via magic line like
 0 string CPC\262 Cartesian Perceptual Compression  image
To display file name extensions i add according to file formats
archive team web site afterwards a line like
 !:ext	cpi/cpc

After applying the above mentioned modifications by patches
file-5.39-fonts-cpi.diff, file-5.39-msdos-cpx.diff,
file-5.39-archive-novell.diff, file-5.39-windows-cpx.diff and
file-5.39-images-cpi.diff then i get a more precise output like:

12520850.CPX:    Windows codepage translator
DEVLOAD.COM:     FREE-DOS executable (COM), UPX compressed
		 , uncompressed 5514 bytes
EGA.CP_:         Personal NetWare Packed File, was "EGA.CPI"
		 with 24888 bytes
ega.cpi:         DR-DOS code page font data collection
ega10.cpi:       DOS code page font data collection
ega10.cpx:       FREE-DOS executable (COM), UPX compressed
		 DOS code page font, uncompressed 58880 bytes
ega18.cpx:       FREE-DOS executable (COM), UPX compressed
		 DOS code page font, uncompressed 29540 bytes
FaxTest.cpi:     Cartesian Perceptual Compression image
GEM.CPI:         DR-DOS code page font data collection
Gilman2.cpc:     Cartesian Perceptual Compression image
Packed File.txt: ASCII text, with CRLF line terminators
TICKLE.COM:      FREE-DOS executable (COM), UPX compressed
		 , uncompressed 2658 bytes

I hope my diff files can be applied in future version of
file utility.

With best wishes
Jörg Jenderek
- --
Jörg Jenderek










-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCXxGtigAKCRCv8rHJQhrU
1pOsAJwMNw28J5fkI1T3jFA3gcHldzWWAwCfTcWyrHvyRsmK/65P7fxpWgG6hZU=
=rhF8
-----END PGP SIGNATURE-----
-------------- next part --------------
--- file-5.39/magic/Magdir/windows.old	2020-05-31 10:34:41 +0000
+++ file-5.39/magic/Magdir/windows	2020-07-14 20:51:01 +0000
@@ -577,4 +577,11 @@
 !:mime text/plain
 !:ext	hhp
+# From:		Joerg Jenderek
+# URL:		https://documentation.basis.com/BASISHelp/WebHelp/b3odbc/obdcdriv_character_translation.htm
+# Reference:	https://www.garykessler.net/library/file_sigs.html
+# Note:		stored in directory %WINDIR%\SysWOW64 or %WINDIR%\system
+>>&0	regex/c		\^(Windows\ Latin)				Windows codepage translator
+!:mime	text/plain
+!:ext	cpx
 # unknown keyword after opening bracket
 >>&0	default				x
-------------- next part --------------
--- file-5.39/magic/Magdir/images.old	2020-05-31 10:34:40 +0000
+++ file-5.39/magic/Magdir/images	2020-07-15 11:50:26 +0000
@@ -1413,4 +1413,6 @@
 # https://www.cartesianinc.com/Tech/
+# Reference:	http://fileformats.archiveteam.org/wiki/Cartesian_Perceptual_Compression
 0	string	CPC\262		Cartesian Perceptual Compression image
 !:mime	image/x-cpi
+!:ext	cpi/cpc
 
-------------- next part --------------
--- file-5.39/magic/Magdir/fonts.old	2020-05-31 10:34:40 +0000
+++ file-5.39/magic/Magdir/fonts	2020-07-15 13:42:34 +0000
@@ -125,3 +125,11 @@
 # Misc. DOS VGA fonts, from Albert Cahalan (acahalan at cs.uml.edu)
+# Update:	Joerg Jenderek
+# URL:		http://fileformats.archiveteam.org/wiki/CPI
+# Reference:	http://www.delorie.com/djgpp/doc/rbinter/it/58/17.html
 0	belong		0xff464f4e	DOS code page font data collection
+!:mime	font/x-dos-cpi
+!:ext	cpi
+0	string		\x7fDRFONT	DR-DOS code page font data collection
+!:mime	font/x-drdos-cpi
+!:ext	cpi
 7	belong		0x00454741	DOS code page font data
-------------- next part --------------
--- file-5.39/magic/Magdir/msdos.old	2020-05-31 10:34:40 +0000
+++ file-5.39/magic/Magdir/msdos	2020-07-17 12:45:02 +0000
@@ -572,7 +572,27 @@
 
+# URL:		https://en.wikipedia.org/wiki/UPX
+# Reference:	https://github.com/upx/upx/archive/v3.96.zip/upx-3.96/
+#		src/stub/src/i086-dos16.com.S 
+# Update:	Joerg Jenderek
+# assembler instructions: cmp sp, offset sp_limit
 0	string/b	\x81\xfc
+#>2	uleshort	x		\b, sp_limit=0x%x
+# assembler instructions: jump above +2; int 0x20; mov cx, offset bytes_to_copy
 >4	string	\x77\x02\xcd\x20\xb9
->>36	string	UPX!			FREE-DOS executable (COM), UPX compressed
-!:mime	application/x-dosexec
-!:ext	com
+#>9	uleshort	x		\b, [bytes_to_copy]=0x%x
+# at different offsets assembler instructions: push di; jump decomp_start_n2b
+>0x1e	search/3	\x57\xe9
+#>>&0	uleshort	x		\b, decomp_start_n2b=0x%x
+# src/stub/src/include/header.S; UPX_MAGIC_LE32
+>>&2	string		UPX!		FREE-DOS executable (COM), UPX
+!:mime	application/x-dosexec
+# UPX compressed *.CPI; See ./fonts
+>>>&21	string		=FONT		compressed DOS code page font
+!:ext	cpx
+>>>&21	string		!FONT		compressed
+!:ext	com
+# compressed size?
+#>>>&14	uleshort+152	x		\b, %u bytes
+# uncompressed len
+>>>&12	uleshort	x		\b, uncompressed %u bytes
 252	string Must\ have\ DOS\ version DR-DOS executable (COM)
@@ -580,11 +600,2 @@
 !:ext	com
-# added by Joerg Jenderek at Oct 2008
-# GRR search is not working
-#34	search/2	UPX!		FREE-DOS executable (COM), UPX compressed
-34	string	UPX!			FREE-DOS executable (COM), UPX compressed
-!:mime	application/x-dosexec
-!:ext	com
-35	string	UPX!			FREE-DOS executable (COM), UPX compressed
-!:mime	application/x-dosexec
-!:ext	com
 # GRR search is not working
-------------- next part --------------
--- file-5.39/magic/Magdir/archive.old	2020-06-15 00:01:01 +0000
+++ file-5.39/magic/Magdir/archive	2020-07-16 00:58:56 +0000
@@ -1473,6 +1473,15 @@
 
 # DR-DOS 7.03 Packed File *.??_
-0	string	Packed\ File\ 	Personal NetWare Packed File
->12	string	x		\b, was "%.12s"
+# Reference: http://www.antonis.de/dos/dos-tuts/mpdostip/html/nwdostip.htm
+# Note:	unpacked by PNUNPACK.EXE
+0	string	Packed\ File\ 
+# by looking for Control-Z skip ASCII text starting with Packed File 
+>0x18	ubyte	0x1a		Personal NetWare Packed File
+!:mime	application/x-novell-compress
+!:ext	??_
+>>12	string	x		\b, was "%.12s"
+# 1 or 2
+#>>0x19	ubyte	x		\b, at 0x19 %u
+>>0x1b	ulelong	x		with %u bytes
 
 # EET archive
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.39-windows-cpx.diff.sig
Type: application/octet-stream
Size: 95 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20200717/3f6be075/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.39-images-cpi.diff.sig
Type: application/octet-stream
Size: 95 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20200717/3f6be075/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.39-fonts-cpi.diff.sig
Type: application/octet-stream
Size: 95 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20200717/3f6be075/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.39-msdos-cpx.diff.sig
Type: application/octet-stream
Size: 95 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20200717/3f6be075/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.39-archive-novell.diff.sig
Type: application/octet-stream
Size: 94 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20200717/3f6be075/attachment-0004.obj>


More information about the File mailing list