[File] [PATCH] of Magdir/msdos COM executable for DOS misidentifies some *.IMG *.PE3 *.TXT

Christos Zoulas christos at zoulas.com
Sun Jul 24 23:52:12 UTC 2022


Committed, thanks!

christos

> On Jul 21, 2022, at 8:09 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hello,
> 
> some days ago i handled some DOS executable (COM). The variant
> starting with move instruction are described as "COM executable for
> DOS". Unfortunately also some non COM samples are described as such
> file type which is wrong. When running file command version 5.42 with
> option -k on such examples and related files i get an output like:
> 
> FINDDISK.COM: COM executable for DOS
> Gpt.com:      DOS/MBR boot sector  DOS/MBR boot sector
> 	      COM executable for DOS
> IMAGINFO.PE3: COM executable for DOS
> LOADER.COM:   COM executable for DOS
> Mbr.com:      DOS/MBR boot sector  DOS/MBR boot sector
> 	      COM executable for DOS
> REBOOT.COM:   COM executable for DOS
> RESTART.COM:  COM executable for DOS
> SETENHKB.COM: COM executable for DOS
> banner.com:   COM executable for DOS
> bcdw_cl.com:  COM executable for DOS
> copybs.com:   COM executable for DOS
> euckr_.txt:   COM executable for DOS ,
> 	      ISO-8859 text, with CRLF line terminators
> fdemuoff.com: COM executable for DOS
> flashimg.img: DOS/MBR boot sector  DOS/MBR boot sector
> 	      COM executable for DOS
> gfxboot.com:  COM executable for DOS
> gif2raw.com:  COM executable for DOS
> poweroff.com: COM executable for DOS
> rem.com:      COM executable for DOS
> sys.com:      COM executable for DOS
> syslinux.com: COM executable for DOS
> 
> The description happens inside Magdir/msdos by lines like:
> 0	ubyte		0xb8
>> 0	string		!\xb8\xc0\x07\x8e
>>> 1	lelong&0xFFFFFFFe 0x21CD4CFe	COM executable (32-bit
>>> 1	default	x			COM executable for DOS
> !:mime	application/x-dosexec
> !:ext com
> The first line test for 1 byte move instruction (0xb8) at the
> beginning. By second line some Linux kernels like memtest.bin are
> skipped. By third test COM executable (32-bit COMBOOT) are matched.
> What is remaining is often DOS COM executable, but some times also
> other file types, because in reality we have only used 1 byte move
> instruction as test. That apparently is too weak.
> 
> The first step is replacing the displaying part by calling sub
> routine msdos-com. So then only some additional test lines must be
> inserted before calling this routine.
> 
> At the end of this sub routine the first 4 bytes of the executable
> are shown by line like:
>> 0	ubelong		x		\b, start instruction %#8.8x
> For control reason i show more bytes by additional line
>> 4	ubelong		x		%8.8x
> 
> So i see that many COM executables contain the byte sequence cd21
> near the beginning. That is interrupt 21H. Or some COM files have
> at least byte sequence cd. That is interrupt with other INT number
> like 13H. For many misidentified examples this byte sequences does
> not occur. Furthermore is see that some COM files contain only a
> few byte like example rem.com (from DJGPP suite) with four bytes.
> That has an ugly side effect. In my first efforts i tried to skip
> "DOS/MBR boot
> sector" samples by checking for boot signature sequence 55AA at
> offset 510. Unfortunately this does not work for short COM
> executables. I believe this is a BUG in file command!
> 
> So now i look for interrupt instruction by line:
>>>> 3	search/118	\xCD
> This is true for short examples like REM.COM or bigger one like
> LOADER.COM (DR-DOS 7.x). For control reason you can show interrupt
> number by debugging line like:
>>>>> &0	ubyte	x			\b, INTERUPT %#x
> So we see the used hexadecimal interrupt numbers in COM samples like:
> 10~BANNER.COM 13~bcdw_cl.com 15~poweroff.com (Syslinux)
> 1A~BERNDPCI.COM 20~SETENHKB.COM 22~gfxboot.com (Syslinux)
> Unfortunately value 13h and 16h is also found in some DOS/MBR boot
> sector samples.
> 
> So the sub branch for INT13 looks like:
>>>>> &0	ubyte	=0x13
>>>>>> 3	ubequad	!0x8ec0b8c0078ed88d
>>>>>>> 0		use		msdos-com
> The Gpt.com and Mbr.com and not real DOS executables, but  these are
> boot sectors from edk2-UDK2018 suite. When looking in source listing
> i see that next instructions at offset 3 are "mov  es,ax ; mov
> ax,07c0h ; mov ds,ax". That is byte sequence 8ec0b8c0078ed88d. After
> skipping such boot sectors i can now call sub routine. This matches
> here few DOS files with interrupt 0x13 instruction like bcdw_cl.com
> and fdemuoff.com. These are part of Bootable CD Wizard ( see
> bootcd.narod.ru/bcdw150z_en.zip).
> 
> So the second sub branch for INT13 looks like:
>>>>> &0	ubyte	=0x16
>>>>>> 8	ubelong	!0x3DE4E475
>>>>>>> 0		use		msdos-com
> The flashimg.img is not a real DOS executable. It is a boot image
> part of Syslinux suite version 3.71. When looking in source listing i
> see that next instructions are "cmp ax 0xE4E4 (magic); jnz"
> That is byte sequence 3DE4E475. After skipping such boot sectors i
> can now call sub routine. This matches here DOS files with
> interrupt 0x16 instruction. I myself found no such examples.
> 
> Third sub branch for samples with interrupt instruction unequal 0x13
> and 0x16 look like:
>>>>> &0	default	x
>>>>>> 0		use		msdos-com
> This matches many DOS examples (like: LOADER.COM SETENHKB.COM
> banner.com copybs.com gif2raw.com poweroff.com rem.com).
> 
> The last branch is for few COM executables without interrupt
> instruction or some misidentified non "boot sector" samples.
> When i look at second instruction at offset 3 i find 0x50 for
> RESTART.COM or 0x8e for REBOOT.COM. For misidentified
> Ulead Imaginfo thumbnail (IMAGINFO.PE3 sky_snow) value here is 0.
> For some EUC-KR text files ( like euckr_falsepositive.txt or
> euckr_.txt) value here was 0xb1. So skip such misidentified "bad"
> samples and call sub routine only for such few valid examples like
> RESTART.COM (DOS 7.10) or REBOOT.COM by branch which looks like:
>>>> 3	default	x
>>>>> 3	ubyte	!0x0
>>>>>> 3	ubyte	!0xb1
>>>>>>> 0	use		msdos-com
> 
> After applying the above mentioned modifications by patches
> file-msdos-com-mov.diff and using Magdir/filesystems then
> misidentifications vanish and some more details
> are shown. This now looks with -k option like:
> 
> FINDDISK.COM: DOS executable (COM),
> 	      start instruction 0xb82425ba 4e01cd21
> Gpt.com:      DOS/MBR boot sector DOS/MBR boot sector
> IMAGINFO.PE3: data
> LOADER.COM:   DOS executable (COM),
> 	      start instruction 0xb80061ba 081e3bc4
> Mbr.com:      DOS/MBR boot sector DOS/MBR boot sector
> REBOOT.COM:   DOS executable (COM),
> 	      start instruction 0xb840008e d8be7200
> RESTART.COM:  DOS executable (COM),
> 	      start instruction 0xb8400050 1fc70672
> SETENHKB.COM: DOS executable (COM),
> 	      start instruction 0xb84000bf 96008ec0
> banner.com:   DOS executable (COM),
> 	      start instruction 0xb81300cd 10b82111
> bcdw_cl.com:  DOS executable (COM),
> 	      start instruction 0xb89f54cd 130f8215
> copybs.com:   DOS executable (COM),
> 	      start instruction 0xb80030cd 2186c4a3
> euckr_.txt:   ISO-8859 text, with CRLF line terminators
> fdemuoff.com: DOS executable (COM),
> 	      start instruction 0xb8004b32 d2be0c01
> flashimg.img: DOS/MBR boot sector DOS/MBR boot sector
> gfxboot.com:  DOS executable (COM),
> 	      maybe with interrupt 22h,
> 	      start instruction 0xb80200bb 4b87cd22
> gif2raw.com:  DOS executable (COM),
> 	      start instruction 0xb80630ba 36163bc4
> poweroff.com: DOS executable (COM),
> 	      maybe with interrupt 22h,
> 	      start instruction 0xb8005331 dbcd1573
> rem.com:      DOS executable (COM),
> 	      start instruction 0xb8004ccd
> sys.com:      DOS executable (COM),
> 	      start instruction 0xb82b2a05 0f00b104
> syslinux.com: DOS executable (COM),
> 	      start instruction 0xb80030cd 2186c4a3
> 
> I hope my diff file can be applied in future version of file
> utility and that i catch now all such "good" COM executables and
> misidentified "bad" others.
> 
> With best wishes
> Jörg Jenderek
> - --
> Jörg Jenderek
> 
> 
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
> 
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYtnqxgAKCRCv8rHJQhrU
> 1ldAAKCjNlu2WFhaogON9JZ7OSxd+XWckACfUNFaxc+2dEvApIlUM6zWE7s0+yA=
> =XQme
> -----END PGP SIGNATURE-----
> <file-msdos-mov_diff.DEFANGED-32><file-msdos-mov_diff_sig.DEFANGED-33>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20220724/137d814c/attachment-0001.asc>


More information about the File mailing list