[File] [PATCH] doc/magic.man ; in 8 MZ examples e_lfarlc pointers must be unsigned

Christos Zoulas christos at zoulas.com
Mon Oct 9 13:42:41 UTC 2023


Committed, thanks!

christos

> On Oct 5, 2023, at 4:31 PM, Jörg Jenderek (GMX) <joerg.jen.der.ek at gmx.net> wrote:
> 
> Hello,
> 
> some month ago i send patch file-5.43-msdos-e_lfarlc.diff of
> Magdir/msdos to correct MZ DOS/Windows executables recognition. I looked
> inside Magdir/msdos of file command version 5.45 and see that my patches
> are accepted.
> 
> Unfortunately the concerned section are also mentioned as examples in
> man page magic.man (v 1.103). So there the old and wrong expressions are
> listed.
> 
> I will recapitulate the lines inside Magdir/msdos that starts like:
> 0	string/b	MZ
> #>0x18		uleshort	x	\b, e_lfarlc=0x%x
> >0x18	uleshort <0x40
> 
> After looking for e_magic MZ then use the relocation table pointer
> e_lfarlc to do sub classification. For most non-DOS MZ-executable
> extensions (That are Windows like) have the relocation table more than
> 0x40 bytes into the file whereas for DOS like it is the opposite. For
> MiTeC Portable Executable Reader EXE64.exe found in archive
> http://www.mitec.cz/Downloads/EXE.zip i get "high" value
> e_lfarlc=0x8ead. In old expressions the test for e_lfarlc limit 0x40 was
> done as signed. So here value 0x8ead was handled as a negative number,
> that was considered as below 0x40 limit. So in old expressions EXE64.exe
> was handled wrong by branch for pure DOS executables.
> 
> So all tests must be done as unsigned via "uleshort" test. Or in other
> words the test via "leshort" i wrong. When looking doc/magic.man i found
> 8 places where the old "leshort" expression is used like:
> 
> 0      string   MZ
> >0x18  leshort  <0x40   MS-DOS executable
> >0x18  leshort  >0x3f   extended PC executable (e.g., MS Windows)
> 
> # MS Windows executables are also valid MS-DOS executables
> 0           string  MZ
> >0x18       leshort <0x40   MZ executable (MS-DOS)
> # skip the whole block below if it is not an extended executable
> >0x18       leshort >0x3f
> >>(0x3c.l)  string  PE\0\0  PE executable (MS-Windows)
> >>(0x3c.l)  string  LX\0\0  LX executable (OS/2)
> 
> # MS Windows executables are also valid MS-DOS executables
> 0           string  MZ
> # sometimes, the value at 0x18 is less that 0x40 but there's still an
> # extended executable, simply appended to the file
> >0x18       leshort <0x40
> >>(4.s*512) leshort 0x014c  COFF executable (MS-DOS, DJGPP)
> >>(4.s*512) leshort !0x014c MZ executable (MS-DOS)
> 
> 0           string  MZ
> >0x18       leshort >0x3f
> >>(0x3c.l)  string  PE\0\0    PE executable (MS-Windows)
> # immediately following the PE signature is the CPU type
> >>>&0       leshort 0x14c     for Intel 80386
> >>>&0       leshort 0x184     for DEC Alpha
> 
> 0             string  MZ
> >0x18         leshort <0x40
> >>(4.s*512)   leshort !0x014c MZ executable (MS-DOS)
> # if it's not COFF, go back 512 bytes and add the offset taken
> # from byte 2/3, which is yet another way of finding the start
> # of the extended executable
> >>>&(2.s-514) string  LE      LE executable (MS Windows VxD driver)
> 
> 0                 string  MZ
> >0x18             leshort >0x3f
> >>(0x3c.l)        string  LE\0\0  LE executable (MS-Windows)
> # at offset 0x80 (-4, since relative offsets start at the end
> # of the up-level match) inside the LE header, we find the absolute
> # offset to the code area, where we look for a specific signature
> >>>(&0x7c.l+0x26) string  UPX     \b, UPX compressed
> 
> 0                string  MZ
> >0x18            leshort >0x3f
> >>(0x3c.l)       string  LE\0\0 LE executable (MS-Windows)
> # at offset 0x58 inside the LE header, we find the relative offset
> # to a data area where we look for a specific signature
> >>>&(&0x54.l-3)  string  UNACE  \b, ACE self-extracting archive
> 
> 0	string       MZ
> >0x18             leshort      >0x3f
> >>(0x3c.l)        string       PE\0\0 PE executable (MS-Windows)
> # search for the PE section called ".idata"...
> >>>&0xf4          search/0x140 .idata
> # ...and go to the end of it, calculated from start+length;
> # these are located 14 and 10 bytes after the section name
> >>>>(&0xe.l+(-4)) string       PK\3\4 \b, ZIP self-extracting archive
> 
> So i replaced leshort by uleshort expression. I also insert spaces where
> needed to get columns with same indention. In section with DEC Alpha i
> also insert line for x86-64 architecture, which is nowadays more often used.
> 
> Because my brain is too little to remember a correct command to get the
> formatted manual text page for control reasons like "groff -Tlatin1 -m
> man doc/magic.man" i put this as compilation instruction for Emacs
> editor inside the man text. I also want that the computer works for me
> and not vice versa. So i instruct Emacs editor to update automatically
> the second line with current man page date which looked like:
> .Dd Arpil 18, 2023
> Apparently all people handling this man page are blind! Because the used
> month name Arpil is wrong! The correct name was April.
> 
> So i put suited nroof comment lines at the end of the man page like:
> .\"
> .\" For emacs editor
> .\" Local Variables:
> .\" eval: (add-hook 'before-save-hook 'time-stamp)
> .\" time-stamp-start: ".Dd "
> .\" time-stamp-end: "$"
> .\" time-stamp-format: "%:B %02d, %:Y"
> .\" time-stamp-time-zone: "UTC0"
> .\" system-time-locale: "C"
> .\" eval:(setq compile-command (concat "groff -Tlatin1 -m man "
> (buffer-file-name)) )
> .\" End:
> 
> This works for me and ffter applying the above mentioned modifications
> by patch file-5.45-magic.man.diff then correct test for e_lfarlc pointer
> are also shown in mentioned examples in man page.
> 
> I hope my diff file can be applied in future version of file
> utility.
> 
> With best wishes,
> Jörg Jenderek
> --
> Jörg Jenderek
> <file-5_45-magic_man_diff.DEFANGED-369><file-5_45-magic_man_diff_sig.DEFANGED-370>-- 
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>



More information about the File mailing list