[File] [PATCH] doc/magic.man ; in 8 MZ examples e_lfarlc pointers must be unsigned

Jörg Jenderek (GMX) joerg.jen.der.ek at gmx.net
Thu Oct 5 20:31:12 UTC 2023


Hello,

some month ago i send patch file-5.43-msdos-e_lfarlc.diff of
Magdir/msdos to correct MZ DOS/Windows executables recognition. I looked
inside Magdir/msdos of file command version 5.45 and see that my patches
are accepted.

Unfortunately the concerned section are also mentioned as examples in
man page magic.man (v 1.103). So there the old and wrong expressions are
listed.

I will recapitulate the lines inside Magdir/msdos that starts like:
  0	string/b	MZ
  #>0x18		uleshort	x	\b, e_lfarlc=0x%x
  >0x18	uleshort <0x40

After looking for e_magic MZ then use the relocation table pointer
e_lfarlc to do sub classification. For most non-DOS MZ-executable
extensions (That are Windows like) have the relocation table more than
0x40 bytes into the file whereas for DOS like it is the opposite. For
MiTeC Portable Executable Reader EXE64.exe found in archive
http://www.mitec.cz/Downloads/EXE.zip i get "high" value
e_lfarlc=0x8ead. In old expressions the test for e_lfarlc limit 0x40 was
done as signed. So here value 0x8ead was handled as a negative number,
that was considered as below 0x40 limit. So in old expressions EXE64.exe
was handled wrong by branch for pure DOS executables.

So all tests must be done as unsigned via "uleshort" test. Or in other
words the test via "leshort" i wrong. When looking doc/magic.man i found
8 places where the old "leshort" expression is used like:

  0      string   MZ
  >0x18  leshort  <0x40   MS-DOS executable
  >0x18  leshort  >0x3f   extended PC executable (e.g., MS Windows)

  # MS Windows executables are also valid MS-DOS executables
  0           string  MZ
  >0x18       leshort <0x40   MZ executable (MS-DOS)
  # skip the whole block below if it is not an extended executable
  >0x18       leshort >0x3f
  >>(0x3c.l)  string  PE\0\0  PE executable (MS-Windows)
  >>(0x3c.l)  string  LX\0\0  LX executable (OS/2)

  # MS Windows executables are also valid MS-DOS executables
  0           string  MZ
  # sometimes, the value at 0x18 is less that 0x40 but there's still an
  # extended executable, simply appended to the file
  >0x18       leshort <0x40
  >>(4.s*512) leshort 0x014c  COFF executable (MS-DOS, DJGPP)
  >>(4.s*512) leshort !0x014c MZ executable (MS-DOS)

  0           string  MZ
  >0x18       leshort >0x3f
  >>(0x3c.l)  string  PE\0\0    PE executable (MS-Windows)
  # immediately following the PE signature is the CPU type
  >>>&0       leshort 0x14c     for Intel 80386
  >>>&0       leshort 0x184     for DEC Alpha

  0             string  MZ
  >0x18         leshort <0x40
  >>(4.s*512)   leshort !0x014c MZ executable (MS-DOS)
  # if it's not COFF, go back 512 bytes and add the offset taken
  # from byte 2/3, which is yet another way of finding the start
  # of the extended executable
  >>>&(2.s-514) string  LE      LE executable (MS Windows VxD driver)

  0                 string  MZ
  >0x18             leshort >0x3f
  >>(0x3c.l)        string  LE\0\0  LE executable (MS-Windows)
  # at offset 0x80 (-4, since relative offsets start at the end
  # of the up-level match) inside the LE header, we find the absolute
  # offset to the code area, where we look for a specific signature
  >>>(&0x7c.l+0x26) string  UPX     \b, UPX compressed

  0                string  MZ
  >0x18            leshort >0x3f
  >>(0x3c.l)       string  LE\0\0 LE executable (MS-Windows)
  # at offset 0x58 inside the LE header, we find the relative offset
  # to a data area where we look for a specific signature
  >>>&(&0x54.l-3)  string  UNACE  \b, ACE self-extracting archive

  0	string       MZ
  >0x18             leshort      >0x3f
  >>(0x3c.l)        string       PE\0\0 PE executable (MS-Windows)
  # search for the PE section called ".idata"...
  >>>&0xf4          search/0x140 .idata
  # ...and go to the end of it, calculated from start+length;
  # these are located 14 and 10 bytes after the section name
  >>>>(&0xe.l+(-4)) string       PK\3\4 \b, ZIP self-extracting archive

So i replaced leshort by uleshort expression. I also insert spaces where
needed to get columns with same indention. In section with DEC Alpha i
also insert line for x86-64 architecture, which is nowadays more often used.

Because my brain is too little to remember a correct command to get the
formatted manual text page for control reasons like "groff -Tlatin1 -m
man doc/magic.man" i put this as compilation instruction for Emacs
editor inside the man text. I also want that the computer works for me
and not vice versa. So i instruct Emacs editor to update automatically
the second line with current man page date which looked like:
.Dd Arpil 18, 2023
Apparently all people handling this man page are blind! Because the used
month name Arpil is wrong! The correct name was April.

So i put suited nroof comment lines at the end of the man page like:
.\"
.\" For emacs editor
.\" Local Variables:
.\" eval: (add-hook 'before-save-hook 'time-stamp)
.\" time-stamp-start: ".Dd "
.\" time-stamp-end: "$"
.\" time-stamp-format: "%:B %02d, %:Y"
.\" time-stamp-time-zone: "UTC0"
.\" system-time-locale: "C"
.\" eval:(setq compile-command (concat "groff -Tlatin1 -m man "
(buffer-file-name)) )
.\" End:

This works for me and ffter applying the above mentioned modifications
by patch file-5.45-magic.man.diff then correct test for e_lfarlc pointer
are also shown in mentioned examples in man page.

I hope my diff file can be applied in future version of file
utility.

With best wishes,
Jörg Jenderek
--
Jörg Jenderek
-------------- next part --------------
--- file-5.45/doc/magic.man.old	2023-07-27 20:04:45.000000000 +0200
+++ file-5.45/doc/magic.man	2023-10-05 21:29:52.442143655 +0200
@@ -1,3 +1,3 @@
 .\" $File: magic.man,v 1.103 2023/07/20 14:32:07 christos Exp $
-.Dd Arpil 18, 2023
+.Dd October 05, 2023
 .Dt MAGIC __FSECTION__
@@ -612,5 +612,5 @@
 .Bd -literal -offset indent
-0      string   MZ
-\*[Gt]0x18  leshort  \*[Lt]0x40   MS-DOS executable
-\*[Gt]0x18  leshort  \*[Gt]0x3f   extended PC executable (e.g., MS Windows)
+0      string    MZ
+\*[Gt]0x18  uleshort  \*[Lt]0x40   MS-DOS executable
+\*[Gt]0x18  uleshort  \*[Gt]0x3f   extended PC executable (e.g., MS Windows)
 .Ed
@@ -670,8 +670,8 @@
 # MS Windows executables are also valid MS-DOS executables
-0           string  MZ
-\*[Gt]0x18       leshort \*[Lt]0x40   MZ executable (MS-DOS)
+0           string   MZ
+\*[Gt]0x18       uleshort \*[Lt]0x40  MZ executable (MS-DOS)
 # skip the whole block below if it is not an extended executable
-\*[Gt]0x18       leshort \*[Gt]0x3f
-\*[Gt]\*[Gt](0x3c.l)  string  PE\e0\e0  PE executable (MS-Windows)
-\*[Gt]\*[Gt](0x3c.l)  string  LX\e0\e0  LX executable (OS/2)
+\*[Gt]0x18       uleshort \*[Gt]0x3f
+\*[Gt]\*[Gt](0x3c.l)  string   PE\e0\e0 PE executable (MS-Windows)
+\*[Gt]\*[Gt](0x3c.l)  string   LX\e0\e0 LX executable (OS/2)
 .Ed
@@ -689,8 +689,8 @@
 # MS Windows executables are also valid MS-DOS executables
-0           string  MZ
+0           string   MZ
 # sometimes, the value at 0x18 is less that 0x40 but there's still an
 # extended executable, simply appended to the file
-\*[Gt]0x18       leshort \*[Lt]0x40
-\*[Gt]\*[Gt](4.s*512) leshort 0x014c  COFF executable (MS-DOS, DJGPP)
-\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS)
+\*[Gt]0x18       uleshort \*[Lt]0x40
+\*[Gt]\*[Gt](4.s*512) leshort  0x014c  COFF executable (MS-DOS, DJGPP)
+\*[Gt]\*[Gt](4.s*512) leshort  !0x014c MZ executable (MS-DOS)
 .Ed
@@ -704,8 +704,9 @@
 .Bd -literal -offset indent
-0           string  MZ
-\*[Gt]0x18       leshort \*[Gt]0x3f
-\*[Gt]\*[Gt](0x3c.l)  string  PE\e0\e0    PE executable (MS-Windows)
+0           string   MZ
+\*[Gt]0x18       uleshort \*[Gt]0x3f
+\*[Gt]\*[Gt](0x3c.l)  string   PE\e0\e0    PE executable (MS-Windows)
 # immediately following the PE signature is the CPU type
-\*[Gt]\*[Gt]\*[Gt]\*[Am]0       leshort 0x14c     for Intel 80386
-\*[Gt]\*[Gt]\*[Gt]\*[Am]0       leshort 0x184     for DEC Alpha
+\*[Gt]\*[Gt]\*[Gt]\*[Am]0       leshort  0x14c     for Intel 80386
+\*[Gt]\*[Gt]\*[Gt]\*[Am]0       leshort  0x8664    for x86-64
+\*[Gt]\*[Gt]\*[Gt]\*[Am]0       leshort  0x184     for DEC Alpha
 .Ed
@@ -714,5 +715,5 @@
 .Bd -literal -offset indent
-0             string  MZ
-\*[Gt]0x18         leshort \*[Lt]0x40
-\*[Gt]\*[Gt](4.s*512)   leshort !0x014c MZ executable (MS-DOS)
+0             string   MZ
+\*[Gt]0x18         uleshort \*[Lt]0x40
+\*[Gt]\*[Gt](4.s*512)   leshort  !0x014c MZ executable (MS-DOS)
 # if it's not COFF, go back 512 bytes and add the offset taken
@@ -720,3 +721,3 @@
 # of the extended executable
-\*[Gt]\*[Gt]\*[Gt]\*[Am](2.s-514) string  LE      LE executable (MS Windows VxD driver)
+\*[Gt]\*[Gt]\*[Gt]\*[Am](2.s-514) string   LE      LE executable (MS Windows VxD driver)
 .Ed
@@ -725,5 +726,5 @@
 .Bd -literal -offset indent
-0                 string  MZ
-\*[Gt]0x18             leshort \*[Gt]0x3f
-\*[Gt]\*[Gt](0x3c.l)        string  LE\e0\e0  LE executable (MS-Windows)
+0                 string   MZ
+\*[Gt]0x18             uleshort \*[Gt]0x3f
+\*[Gt]\*[Gt](0x3c.l)        string   LE\e0\e0  LE executable (MS-Windows)
 # at offset 0x80 (-4, since relative offsets start at the end
@@ -731,3 +732,3 @@
 # offset to the code area, where we look for a specific signature
-\*[Gt]\*[Gt]\*[Gt](\*[Am]0x7c.l+0x26) string  UPX     \eb, UPX compressed
+\*[Gt]\*[Gt]\*[Gt](\*[Am]0x7c.l+0x26) string   UPX     \eb, UPX compressed
 .Ed
@@ -736,8 +737,8 @@
 .Bd -literal -offset indent
-0                string  MZ
-\*[Gt]0x18            leshort \*[Gt]0x3f
-\*[Gt]\*[Gt](0x3c.l)       string  LE\e0\e0 LE executable (MS-Windows)
+0                string   MZ
+\*[Gt]0x18            uleshort \*[Gt]0x3f
+\*[Gt]\*[Gt](0x3c.l)       string   LE\e0\e0 LE executable (MS-Windows)
 # at offset 0x58 inside the LE header, we find the relative offset
 # to a data area where we look for a specific signature
-\*[Gt]\*[Gt]\*[Gt]\*[Am](\*[Am]0x54.l-3)  string  UNACE  \eb, ACE self-extracting archive
+\*[Gt]\*[Gt]\*[Gt]\*[Am](\*[Am]0x54.l-3)  string   UNACE  \eb, ACE self-extracting archive
 .Ed
@@ -751,3 +752,3 @@
 0                 string       MZ
-\*[Gt]0x18             leshort      \*[Gt]0x3f
+\*[Gt]0x18             uleshort     \*[Gt]0x3f
 \*[Gt]\*[Gt](0x3c.l)        string       PE\e0\e0 PE executable (MS-Windows)
@@ -830 +831,13 @@
 .\" Modified for Ian Darwin's version of the file command.
+.\"
+.\" For emacs editor
+.\" Local Variables:
+.\" eval: (add-hook 'before-save-hook 'time-stamp)
+.\" time-stamp-start: ".Dd "
+.\" time-stamp-end: "$"
+.\" time-stamp-format: "%:B %02d, %:Y"
+.\" time-stamp-time-zone: "UTC0"
+.\" system-time-locale: "C"
+.\" eval:(setq compile-command (concat "groff -Tlatin1 -m man " (buffer-file-name)) )
+.\" End:
+.\"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.45-magic.man.diff.sig
Type: application/octet-stream
Size: 1613 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20231005/25bd8295/attachment.obj>


More information about the File mailing list