[File] [PATCH] of Magdir/msdos COM executable for DOS misidentifies some *.IMG *.PE3 *.TXT

Jörg Jenderek joerg.jen.der.ek at gmx.net
Fri Jul 22 00:09:43 UTC 2022


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

some days ago i handled some DOS executable (COM). The variant
starting with move instruction are described as "COM executable for
DOS". Unfortunately also some non COM samples are described as such
file type which is wrong. When running file command version 5.42 with
option -k on such examples and related files i get an output like:

FINDDISK.COM: COM executable for DOS
Gpt.com:      DOS/MBR boot sector  DOS/MBR boot sector
	      COM executable for DOS
IMAGINFO.PE3: COM executable for DOS
LOADER.COM:   COM executable for DOS
Mbr.com:      DOS/MBR boot sector  DOS/MBR boot sector
	      COM executable for DOS
REBOOT.COM:   COM executable for DOS
RESTART.COM:  COM executable for DOS
SETENHKB.COM: COM executable for DOS
banner.com:   COM executable for DOS
bcdw_cl.com:  COM executable for DOS
copybs.com:   COM executable for DOS
euckr_.txt:   COM executable for DOS ,
	      ISO-8859 text, with CRLF line terminators
fdemuoff.com: COM executable for DOS
flashimg.img: DOS/MBR boot sector  DOS/MBR boot sector
	      COM executable for DOS
gfxboot.com:  COM executable for DOS
gif2raw.com:  COM executable for DOS
poweroff.com: COM executable for DOS
rem.com:      COM executable for DOS
sys.com:      COM executable for DOS
syslinux.com: COM executable for DOS

The description happens inside Magdir/msdos by lines like:
 0	ubyte		0xb8
 >0	string		!\xb8\xc0\x07\x8e
 >>1	lelong&0xFFFFFFFe 0x21CD4CFe	COM executable (32-bit
 >>1	default	x			COM executable for DOS
 !:mime	application/x-dosexec
 !:ext com
The first line test for 1 byte move instruction (0xb8) at the
beginning. By second line some Linux kernels like memtest.bin are
skipped. By third test COM executable (32-bit COMBOOT) are matched.
What is remaining is often DOS COM executable, but some times also
other file types, because in reality we have only used 1 byte move
instruction as test. That apparently is too weak.

The first step is replacing the displaying part by calling sub
routine msdos-com. So then only some additional test lines must be
inserted before calling this routine.

At the end of this sub routine the first 4 bytes of the executable
are shown by line like:
 >0	ubelong		x		\b, start instruction %#8.8x
For control reason i show more bytes by additional line
 >4	ubelong		x		%8.8x

So i see that many COM executables contain the byte sequence cd21
near the beginning. That is interrupt 21H. Or some COM files have
at least byte sequence cd. That is interrupt with other INT number
like 13H. For many misidentified examples this byte sequences does
not occur. Furthermore is see that some COM files contain only a
few byte like example rem.com (from DJGPP suite) with four bytes.
That has an ugly side effect. In my first efforts i tried to skip
"DOS/MBR boot
sector" samples by checking for boot signature sequence 55AA at
offset 510. Unfortunately this does not work for short COM
executables. I believe this is a BUG in file command!

So now i look for interrupt instruction by line:
 >>>3	search/118	\xCD
This is true for short examples like REM.COM or bigger one like
LOADER.COM (DR-DOS 7.x). For control reason you can show interrupt
number by debugging line like:
 >>>>&0	ubyte	x			\b, INTERUPT %#x
So we see the used hexadecimal interrupt numbers in COM samples like:
10~BANNER.COM 13~bcdw_cl.com 15~poweroff.com (Syslinux)
1A~BERNDPCI.COM 20~SETENHKB.COM 22~gfxboot.com (Syslinux)
Unfortunately value 13h and 16h is also found in some DOS/MBR boot
sector samples.

So the sub branch for INT13 looks like:
 >>>>&0	ubyte	=0x13
 >>>>>3	ubequad	!0x8ec0b8c0078ed88d
 >>>>>>0		use		msdos-com
The Gpt.com and Mbr.com and not real DOS executables, but  these are
boot sectors from edk2-UDK2018 suite. When looking in source listing
i see that next instructions at offset 3 are "mov  es,ax ; mov
ax,07c0h ; mov ds,ax". That is byte sequence 8ec0b8c0078ed88d. After
skipping such boot sectors i can now call sub routine. This matches
here few DOS files with interrupt 0x13 instruction like bcdw_cl.com
and fdemuoff.com. These are part of Bootable CD Wizard ( see
bootcd.narod.ru/bcdw150z_en.zip).

So the second sub branch for INT13 looks like:
 >>>>&0	ubyte	=0x16
 >>>>>8	ubelong	!0x3DE4E475
 >>>>>>0		use		msdos-com
The flashimg.img is not a real DOS executable. It is a boot image
part of Syslinux suite version 3.71. When looking in source listing i
see that next instructions are "cmp ax 0xE4E4 (magic); jnz"
That is byte sequence 3DE4E475. After skipping such boot sectors i
can now call sub routine. This matches here DOS files with
interrupt 0x16 instruction. I myself found no such examples.

Third sub branch for samples with interrupt instruction unequal 0x13
and 0x16 look like:
 >>>>&0	default	x
 >>>>>0		use		msdos-com
This matches many DOS examples (like: LOADER.COM SETENHKB.COM
banner.com copybs.com gif2raw.com poweroff.com rem.com).

The last branch is for few COM executables without interrupt
instruction or some misidentified non "boot sector" samples.
When i look at second instruction at offset 3 i find 0x50 for
RESTART.COM or 0x8e for REBOOT.COM. For misidentified
Ulead Imaginfo thumbnail (IMAGINFO.PE3 sky_snow) value here is 0.
For some EUC-KR text files ( like euckr_falsepositive.txt or
euckr_.txt) value here was 0xb1. So skip such misidentified "bad"
samples and call sub routine only for such few valid examples like
RESTART.COM (DOS 7.10) or REBOOT.COM by branch which looks like:
 >>>3	default	x
 >>>>3	ubyte	!0x0
 >>>>>3	ubyte	!0xb1
 >>>>>>0	use		msdos-com

After applying the above mentioned modifications by patches
file-msdos-com-mov.diff and using Magdir/filesystems then
misidentifications vanish and some more details
are shown. This now looks with -k option like:

FINDDISK.COM: DOS executable (COM),
	      start instruction 0xb82425ba 4e01cd21
Gpt.com:      DOS/MBR boot sector DOS/MBR boot sector
IMAGINFO.PE3: data
LOADER.COM:   DOS executable (COM),
	      start instruction 0xb80061ba 081e3bc4
Mbr.com:      DOS/MBR boot sector DOS/MBR boot sector
REBOOT.COM:   DOS executable (COM),
	      start instruction 0xb840008e d8be7200
RESTART.COM:  DOS executable (COM),
	      start instruction 0xb8400050 1fc70672
SETENHKB.COM: DOS executable (COM),
	      start instruction 0xb84000bf 96008ec0
banner.com:   DOS executable (COM),
	      start instruction 0xb81300cd 10b82111
bcdw_cl.com:  DOS executable (COM),
	      start instruction 0xb89f54cd 130f8215
copybs.com:   DOS executable (COM),
	      start instruction 0xb80030cd 2186c4a3
euckr_.txt:   ISO-8859 text, with CRLF line terminators
fdemuoff.com: DOS executable (COM),
	      start instruction 0xb8004b32 d2be0c01
flashimg.img: DOS/MBR boot sector DOS/MBR boot sector
gfxboot.com:  DOS executable (COM),
	      maybe with interrupt 22h,
	      start instruction 0xb80200bb 4b87cd22
gif2raw.com:  DOS executable (COM),
	      start instruction 0xb80630ba 36163bc4
poweroff.com: DOS executable (COM),
	      maybe with interrupt 22h,
	      start instruction 0xb8005331 dbcd1573
rem.com:      DOS executable (COM),
	      start instruction 0xb8004ccd
sys.com:      DOS executable (COM),
	      start instruction 0xb82b2a05 0f00b104
syslinux.com: DOS executable (COM),
	      start instruction 0xb80030cd 2186c4a3

I hope my diff file can be applied in future version of file
utility and that i catch now all such "good" COM executables and
misidentified "bad" others.

With best wishes
Jörg Jenderek
- --
Jörg Jenderek


-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYtnqxgAKCRCv8rHJQhrU
1ldAAKCjNlu2WFhaogON9JZ7OSxd+XWckACfUNFaxc+2dEvApIlUM6zWE7s0+yA=
=XQme
-----END PGP SIGNATURE-----
-------------- next part --------------
--- file-master/magic/Magdir/msdos.old	2022-07-22 01:30:56.757057500 +0200
+++ file-master/magic/Magdir/msdos	2022-07-22 01:46:15.101147900 +0200
@@ -648,4 +648,6 @@
 >1	search/0xc088	\xcd\x22	\b, maybe with interrupt 22h
 >0	ubelong		x		\b, start instruction %#8.8x
+# show more instructions but not in samples like: rem.com (DJGPP)
+>4	ubelong		x		%8.8x
 
 # JMP 8bit
@@ -754,5 +756,5 @@
 >>>0        use msdos-com
 
-# updated by Joerg Jenderek at Oct 2008,2015
+# updated by Joerg Jenderek at Oct 2008,2015,2022
 # following line is too general
 0	ubyte		0xb8
@@ -777,17 +779,47 @@
 # "COM executable (COM32R)" or "Syslinux COM32 module" by TrID
 >>>1	lelong		0x21CD4CFe	\b, relocatable)
-# Hajin Jang <hajin_jang at worksmobile.com>:
-# Disable simplest COM signature to prevent false positive on some EUC-KR text files.
-## remaining are DOS COM executables starting with assembler instruction MOV
-## like FreeDOS BANNER*.COM FINDDISK.COM GIF2RAW.COM WINCHK.COM
-## MS-DOS SYS.COM RESTART.COM
-## SYSLINUX.COM (version 1.40 - 2.13)
-## GFXBOOT.COM (version 3.75)
-## COPYBS.COM POWEROFF.COM INT18.COM
->>1	default	x			COM executable for DOS
-!:mime	application/x-dosexec
-##!:mime	application/x-ms-dos-executable
-##!:mime	application/x-msdos-program
-!:ext com
+>>1	default	x
+# look for interrupt instruction like in rem.com (DJGPP) LOADER.COM (DR-DOS 7.x)
+>>>3	search/118	\xCD
+# FOR DEBUGGING; possible hexadecimal interupt number like: 10~BANNER.COM 13~bcdw_cl.com 15~poweroff.com (Syslinux)
+# 1A~BERNDPCI.COM 20~SETENHKB.COM 21~mostly 22~gfxboot.com (Syslinux) 2F~SHUTDOWN.COM (GEMSYS)
+#>>>>&0	ubyte	x			\b, INTERUPT %#x
+# few examples with interrupt 0x13 instruction
+>>>>&0	ubyte	=0x13
+# FOR DEBUGGING!
+#>>>>>3	ubequad	x			\b, 2nd INSTRUCTION %#16.16llx
+# skip Gpt.com Mbr.com (edk2-UDK2018 bootsector) described as "DOS/MBR boot sector" by ./filesystems
+# by check for assembler instructions: mov  es,ax ; mov  ax,07c0h ; mov ds,ax 
+>>>>>3	ubequad	!0x8ec0b8c0078ed88d
+# few COM exectables with interrupt 0x13 instruction like: Bootable CD Wizard executables bcdw_cl.com fdemuoff.com
+# http://bootcd.narod.ru/bcdw150z_en.zip
+>>>>>>0		use		msdos-com
+# few examples with interrupt 0x16 instruction like flashimg.img
+>>>>&0	ubyte	=0x16
+# skip Syslinux 3.71 flashimg.img done as "DOS/MBR boot sector" by ./filesystems
+# by check for assembler instructions: cmp ax 0xE4E4 (magic); jnz
+>>>>>8	ubelong	!0x3DE4E475
+# no DOS executable with interrupt 0x16 found
+>>>>>>0		use		msdos-com
+# most examples with interrupt instruction unequal 0x13 and 0x16
+>>>>&0	default	x
+#>>>>>&-1 ubyte	x			\b, INTERUPT %#x
+# like: LOADER.COM SETENHKB.COM banner.com copybs.com gif2raw.com poweroff.com rem.com
+>>>>>0		use		msdos-com
+# few COM executables without interupt instruction like RESTART.COM (DOS 7.10) REBOOT.COM
+# or some EUC-KR text files or one Ulead Imaginfo thumbnail
+>>>3	default	x
+# FOR DEBUGGING; 2nd instruction like 0x50 (RESTART.COM) 0x8e (REBOOT.COM)
+# or random like: 0x0 (IMAGINFO.PE3 sky_snow) 0xb1 (euckr_.txt)
+#>>>>3	ubyte	x			\b, 2nd INSTRUCTION %#x
+# skip 1 Ulead Imaginfo thumbnail (IMAGINFO.PE3 sky_snow) 
+# inside SAMPLES/TEXTURES/SKY_SNOW
+# from https://archive.org/download/PI3CANON/PI3CANON.iso
+>>>>3	ubyte	!0x0
+# skip some EUC-KR text files like: euckr_falsepositive.txt
+# https://bugs.astron.com/view.php?id=186
+>>>>>3	ubyte	!0xb1
+# like: RESTART.COM (DOS 7.10) REBOOT.COM
+>>>>>>0	use		msdos-com
 
 # URL:		https://en.wikipedia.org/wiki/UPX
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-msdos-mov.diff.sig
Type: application/octet-stream
Size: 1845 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220722/fb9c74df/attachment.obj>


More information about the File mailing list