[File] [PATCH] doc/magic.man ; in 8 MZ examples e_lfarlc pointers must be unsigned
Jörg Jenderek (GMX)
joerg.jen.der.ek at gmx.net
Thu Oct 5 20:31:12 UTC 2023
Hello,
some month ago i send patch file-5.43-msdos-e_lfarlc.diff of
Magdir/msdos to correct MZ DOS/Windows executables recognition. I looked
inside Magdir/msdos of file command version 5.45 and see that my patches
are accepted.
Unfortunately the concerned section are also mentioned as examples in
man page magic.man (v 1.103). So there the old and wrong expressions are
listed.
I will recapitulate the lines inside Magdir/msdos that starts like:
0 string/b MZ
#>0x18 uleshort x \b, e_lfarlc=0x%x
>0x18 uleshort <0x40
After looking for e_magic MZ then use the relocation table pointer
e_lfarlc to do sub classification. For most non-DOS MZ-executable
extensions (That are Windows like) have the relocation table more than
0x40 bytes into the file whereas for DOS like it is the opposite. For
MiTeC Portable Executable Reader EXE64.exe found in archive
http://www.mitec.cz/Downloads/EXE.zip i get "high" value
e_lfarlc=0x8ead. In old expressions the test for e_lfarlc limit 0x40 was
done as signed. So here value 0x8ead was handled as a negative number,
that was considered as below 0x40 limit. So in old expressions EXE64.exe
was handled wrong by branch for pure DOS executables.
So all tests must be done as unsigned via "uleshort" test. Or in other
words the test via "leshort" i wrong. When looking doc/magic.man i found
8 places where the old "leshort" expression is used like:
0 string MZ
>0x18 leshort <0x40 MS-DOS executable
>0x18 leshort >0x3f extended PC executable (e.g., MS Windows)
# MS Windows executables are also valid MS-DOS executables
0 string MZ
>0x18 leshort <0x40 MZ executable (MS-DOS)
# skip the whole block below if it is not an extended executable
>0x18 leshort >0x3f
>>(0x3c.l) string PE\0\0 PE executable (MS-Windows)
>>(0x3c.l) string LX\0\0 LX executable (OS/2)
# MS Windows executables are also valid MS-DOS executables
0 string MZ
# sometimes, the value at 0x18 is less that 0x40 but there's still an
# extended executable, simply appended to the file
>0x18 leshort <0x40
>>(4.s*512) leshort 0x014c COFF executable (MS-DOS, DJGPP)
>>(4.s*512) leshort !0x014c MZ executable (MS-DOS)
0 string MZ
>0x18 leshort >0x3f
>>(0x3c.l) string PE\0\0 PE executable (MS-Windows)
# immediately following the PE signature is the CPU type
>>>&0 leshort 0x14c for Intel 80386
>>>&0 leshort 0x184 for DEC Alpha
0 string MZ
>0x18 leshort <0x40
>>(4.s*512) leshort !0x014c MZ executable (MS-DOS)
# if it's not COFF, go back 512 bytes and add the offset taken
# from byte 2/3, which is yet another way of finding the start
# of the extended executable
>>>&(2.s-514) string LE LE executable (MS Windows VxD driver)
0 string MZ
>0x18 leshort >0x3f
>>(0x3c.l) string LE\0\0 LE executable (MS-Windows)
# at offset 0x80 (-4, since relative offsets start at the end
# of the up-level match) inside the LE header, we find the absolute
# offset to the code area, where we look for a specific signature
>>>(&0x7c.l+0x26) string UPX \b, UPX compressed
0 string MZ
>0x18 leshort >0x3f
>>(0x3c.l) string LE\0\0 LE executable (MS-Windows)
# at offset 0x58 inside the LE header, we find the relative offset
# to a data area where we look for a specific signature
>>>&(&0x54.l-3) string UNACE \b, ACE self-extracting archive
0 string MZ
>0x18 leshort >0x3f
>>(0x3c.l) string PE\0\0 PE executable (MS-Windows)
# search for the PE section called ".idata"...
>>>&0xf4 search/0x140 .idata
# ...and go to the end of it, calculated from start+length;
# these are located 14 and 10 bytes after the section name
>>>>(&0xe.l+(-4)) string PK\3\4 \b, ZIP self-extracting archive
So i replaced leshort by uleshort expression. I also insert spaces where
needed to get columns with same indention. In section with DEC Alpha i
also insert line for x86-64 architecture, which is nowadays more often used.
Because my brain is too little to remember a correct command to get the
formatted manual text page for control reasons like "groff -Tlatin1 -m
man doc/magic.man" i put this as compilation instruction for Emacs
editor inside the man text. I also want that the computer works for me
and not vice versa. So i instruct Emacs editor to update automatically
the second line with current man page date which looked like:
.Dd Arpil 18, 2023
Apparently all people handling this man page are blind! Because the used
month name Arpil is wrong! The correct name was April.
So i put suited nroof comment lines at the end of the man page like:
.\"
.\" For emacs editor
.\" Local Variables:
.\" eval: (add-hook 'before-save-hook 'time-stamp)
.\" time-stamp-start: ".Dd "
.\" time-stamp-end: "$"
.\" time-stamp-format: "%:B %02d, %:Y"
.\" time-stamp-time-zone: "UTC0"
.\" system-time-locale: "C"
.\" eval:(setq compile-command (concat "groff -Tlatin1 -m man "
(buffer-file-name)) )
.\" End:
This works for me and ffter applying the above mentioned modifications
by patch file-5.45-magic.man.diff then correct test for e_lfarlc pointer
are also shown in mentioned examples in man page.
I hope my diff file can be applied in future version of file
utility.
With best wishes,
Jörg Jenderek
--
Jörg Jenderek
-------------- next part --------------
--- file-5.45/doc/magic.man.old 2023-07-27 20:04:45.000000000 +0200
+++ file-5.45/doc/magic.man 2023-10-05 21:29:52.442143655 +0200
@@ -1,3 +1,3 @@
.\" $File: magic.man,v 1.103 2023/07/20 14:32:07 christos Exp $
-.Dd Arpil 18, 2023
+.Dd October 05, 2023
.Dt MAGIC __FSECTION__
@@ -612,5 +612,5 @@
.Bd -literal -offset indent
-0 string MZ
-\*[Gt]0x18 leshort \*[Lt]0x40 MS-DOS executable
-\*[Gt]0x18 leshort \*[Gt]0x3f extended PC executable (e.g., MS Windows)
+0 string MZ
+\*[Gt]0x18 uleshort \*[Lt]0x40 MS-DOS executable
+\*[Gt]0x18 uleshort \*[Gt]0x3f extended PC executable (e.g., MS Windows)
.Ed
@@ -670,8 +670,8 @@
# MS Windows executables are also valid MS-DOS executables
-0 string MZ
-\*[Gt]0x18 leshort \*[Lt]0x40 MZ executable (MS-DOS)
+0 string MZ
+\*[Gt]0x18 uleshort \*[Lt]0x40 MZ executable (MS-DOS)
# skip the whole block below if it is not an extended executable
-\*[Gt]0x18 leshort \*[Gt]0x3f
-\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows)
-\*[Gt]\*[Gt](0x3c.l) string LX\e0\e0 LX executable (OS/2)
+\*[Gt]0x18 uleshort \*[Gt]0x3f
+\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows)
+\*[Gt]\*[Gt](0x3c.l) string LX\e0\e0 LX executable (OS/2)
.Ed
@@ -689,8 +689,8 @@
# MS Windows executables are also valid MS-DOS executables
-0 string MZ
+0 string MZ
# sometimes, the value at 0x18 is less that 0x40 but there's still an
# extended executable, simply appended to the file
-\*[Gt]0x18 leshort \*[Lt]0x40
-\*[Gt]\*[Gt](4.s*512) leshort 0x014c COFF executable (MS-DOS, DJGPP)
-\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS)
+\*[Gt]0x18 uleshort \*[Lt]0x40
+\*[Gt]\*[Gt](4.s*512) leshort 0x014c COFF executable (MS-DOS, DJGPP)
+\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS)
.Ed
@@ -704,8 +704,9 @@
.Bd -literal -offset indent
-0 string MZ
-\*[Gt]0x18 leshort \*[Gt]0x3f
-\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows)
+0 string MZ
+\*[Gt]0x18 uleshort \*[Gt]0x3f
+\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows)
# immediately following the PE signature is the CPU type
-\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x14c for Intel 80386
-\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x184 for DEC Alpha
+\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x14c for Intel 80386
+\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x8664 for x86-64
+\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x184 for DEC Alpha
.Ed
@@ -714,5 +715,5 @@
.Bd -literal -offset indent
-0 string MZ
-\*[Gt]0x18 leshort \*[Lt]0x40
-\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS)
+0 string MZ
+\*[Gt]0x18 uleshort \*[Lt]0x40
+\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS)
# if it's not COFF, go back 512 bytes and add the offset taken
@@ -720,3 +721,3 @@
# of the extended executable
-\*[Gt]\*[Gt]\*[Gt]\*[Am](2.s-514) string LE LE executable (MS Windows VxD driver)
+\*[Gt]\*[Gt]\*[Gt]\*[Am](2.s-514) string LE LE executable (MS Windows VxD driver)
.Ed
@@ -725,5 +726,5 @@
.Bd -literal -offset indent
-0 string MZ
-\*[Gt]0x18 leshort \*[Gt]0x3f
-\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows)
+0 string MZ
+\*[Gt]0x18 uleshort \*[Gt]0x3f
+\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows)
# at offset 0x80 (-4, since relative offsets start at the end
@@ -731,3 +732,3 @@
# offset to the code area, where we look for a specific signature
-\*[Gt]\*[Gt]\*[Gt](\*[Am]0x7c.l+0x26) string UPX \eb, UPX compressed
+\*[Gt]\*[Gt]\*[Gt](\*[Am]0x7c.l+0x26) string UPX \eb, UPX compressed
.Ed
@@ -736,8 +737,8 @@
.Bd -literal -offset indent
-0 string MZ
-\*[Gt]0x18 leshort \*[Gt]0x3f
-\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows)
+0 string MZ
+\*[Gt]0x18 uleshort \*[Gt]0x3f
+\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows)
# at offset 0x58 inside the LE header, we find the relative offset
# to a data area where we look for a specific signature
-\*[Gt]\*[Gt]\*[Gt]\*[Am](\*[Am]0x54.l-3) string UNACE \eb, ACE self-extracting archive
+\*[Gt]\*[Gt]\*[Gt]\*[Am](\*[Am]0x54.l-3) string UNACE \eb, ACE self-extracting archive
.Ed
@@ -751,3 +752,3 @@
0 string MZ
-\*[Gt]0x18 leshort \*[Gt]0x3f
+\*[Gt]0x18 uleshort \*[Gt]0x3f
\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows)
@@ -830 +831,13 @@
.\" Modified for Ian Darwin's version of the file command.
+.\"
+.\" For emacs editor
+.\" Local Variables:
+.\" eval: (add-hook 'before-save-hook 'time-stamp)
+.\" time-stamp-start: ".Dd "
+.\" time-stamp-end: "$"
+.\" time-stamp-format: "%:B %02d, %:Y"
+.\" time-stamp-time-zone: "UTC0"
+.\" system-time-locale: "C"
+.\" eval:(setq compile-command (concat "groff -Tlatin1 -m man " (buffer-file-name)) )
+.\" End:
+.\"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.45-magic.man.diff.sig
Type: application/octet-stream
Size: 1613 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20231005/25bd8295/attachment.obj>
More information about the File
mailing list