[File] [PATCH] of Magdir/coff more tests to avoid misidentification
Jörg Jenderek
joerg.jen.der.ek at gmx.net
Wed Feb 3 23:52:22 UTC 2021
Hello,
some days ago i handled some Windows font with fnt file name
extension. When running file command version 5.39 on such fonts, COFF
samples and other test files i get an output like:
bindings.o: Intel 80386 COFF object file,
no line number info, not stripped,
96 sections, symbol offset=0x186c0,
572 symbols
coff2.o: Intel 80386 COFF object file,
no line number info, not stripped,
1 section, symbol offset=0x7c,
7 symbols
DOCTOR.DAILY: Hitachi SH big-endian COFF object file,
not stripped,
0 section, symbol offset=0xcb060000,
656385 symbols
ega80woa.fnt: Intel ia64 COFF object file,
no line number info, not stripped,
4207 sections, symbol offset=0x6f432029,
1769109872 symbols, optional header size 26727
fda2.obj: Intel 80386 COFF object file,
not stripped,
4 sections, symbol offset=0x2468, 153 symbols
intel-stripped.o: Intel 80386 COFF object file,
no relocation info, no line number info, stripped,
2 sections
msvcrt.lib: Intel 80386 COFF object file,
not stripped,
124 sections, symbol offset=0xc41f,
597 symbols
rsrc.obj: Intel 80386 COFF object file,
not stripped,
3 sections, symbol offset=0x1516,
11 symbols
svgafix.fnt: Intel ia64 COFF object file,
no line number info, not stripped,
4627 sections, symbol offset=0x6f432029,
1769109872 symbols, optional header size 26727
With --extension option only ??? is displayed. Furthermore with -i
option for samples only generic application/octet-stream is shown.
For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). This list the used
file name extension and often with -v option the related URL
pointing to used file format information. (See appended
coff_trid.txt.gz). Luckily TrID tool identifies COFF samples as
"Intel 80386 Common Object File Format (COFF) object" and displays
related URL and extensions.
Unfortunately COFF files only have a short 2 byte magic at the
beginning. So some other files are misidentified as COFF samples. But
luckily the displaying part is done by sub routine display-coff
inside Magdir/coff. So the start is triggered by lines inside
hitachi-sh and intel magic files by lines like:
0 leshort =0514
>0 use display-coff
This sub routine starts with lines like:
0 name display-coff
>18 uleshort&0x8E80 0
>>0 clear x
Before displaying the COFF information i add additional test for
unused flag bits (0x8000,0x0800,0x0400,0x0200,x0080) in f_flags at
Oct 2015. So more additional test lines are needed here.
Later the number of sections f_nscns are show by lines like:
>>2 uleshort <2 \b, %d section
>>2 uleshort >1 \b, %d sections
After testing some hundreds COFF examples i get f_nscns values like:
1 2 3 4 5 7 8 9 11 12 16 19 20 21 22 30 36 40 42 56 80 89 96 124
So real COFF samples have at least 1 section. This is used as second
magic test line like:
>>2 uleshort >0
So misidentified examples like DOCTOR.DAILY READER.NDA REDBOX.ROOT
are skipped by looking for positive number of sections.
Typically COFF samples have only a few sections for code, data etc.
The worst case with highest f_nscns value 124 was msvcrt.lib. So i
assume that real maximal f_nscns is in hundreds range. So i add
additional third magic test line like:
>>>2 uleshort <4207
>>>>0 clear x
So misidentified examples like ega80woa.fnt svgafix.fnt HP3FNTS1.DAT
HP3FNTS2.DAT INTRO.ACT LEARN.PIF are skipped by looking for low
number of sections.
Considering unsignment the number of sections are now become like:
>>>>2 uleshort <2 \b, %u section
>>>>2 uleshort >1 \b, %u sections
If the F_EXEC bit is not set, the file is not considered executable.
So it is an object file. This now becomes visible by line like:
>>>>18 leshort ^0x0002 object file
Now afterwards i show user defined mime type and file name
extensions by lines like:
!:mime application/x-coff
!:ext o/obj/lib
Unfortunately i found no examples with cof extension.
I also added observed values for f_symptr and f_nsyms as comment
lines. These are relative "low" whereas for misidentified examples
i find extreme "high" values. So these values maybe can also be used
in magic test lines if needed.
Furthermore after showing COFF values like in old version i also
display creation time f_timdat by line like
>>>>4 ledate >0 \b, created %s
This is true for little endian, which i tested, but i do not know if
this is also true for big endian.
The optional header size f_opthdr is shown if available by line like:
>>>>16 uleshort >0 \b, optional header size %u
An object file should have a value of 0. So i found no example with a
non zero optional header. So for such cases after the COFF header the
section header starts. At the beginning a 8 byte section name ( like
.text .data .debug$S .drectve .testseg ) is stored. So this
information is now shown by additional lines like:
>>>>16 uleshort =0
>>>>>20 string x \b, 1st section name "%.8s"
It seems that the section name always start with a point character
(0x2E). So maybe this can also be used as magic test.
After applying the above mentioned modifications by patches
file-5.39-coff.diff, then misidentification vanish and COFF samples
are described more precisely like:
bindings.o: Intel 80386 COFF object file,
no line number info, not stripped,
96 sections, symbol offset=0x186c0,
572 symbols,
1st section name ".text"
coff2.o: Intel 80386 COFF object file,
no line number info, not stripped,
1 section, symbol offset=0x7c,
7 symbols, created Mon Apr 17 07:22:41 2006,
1st section name ".data"
DOCTOR.DAILY: data
ega80woa.fnt: data
fda2.obj: Intel 80386 COFF object file,
not stripped,
4 sections, symbol offset=0x2468,
153 symbols, created Fri Nov 25 03:25:11 2011,
1st section name ".drectve"
intel-stripped.o: Intel 80386 COFF object file,
no relocation info, no line number info, stripped,
2 sections,
1st section name ".testseg"
rsrc.obj: Intel 80386 COFF object file,
not stripped,
3 sections, symbol offset=0x1516,
11 symbols, created Mon Oct 03 06:04:18 2011,
1st section name ".debug$S"
svgafix.fnt: data
I hope my diff file can be applied in future version of
file utility.
With best wishes
Jörg Jenderek
--
Jörg Jenderek
-------------- next part --------------
A non-text attachment was scrubbed...
Name: coff_trid.txt.gz
Type: application/x-gzip
Size: 839 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20210204/0ba70ecd/attachment.bin>
-------------- next part --------------
--- file-5.39/magic/Magdir/coff.old 2018-08-01 10:34:03 +0000
+++ file-5.39/magic/Magdir/coff 2021-02-03 20:00:18 +0000
@@ -7,3 +7,3 @@
#
-# by Joerg Jenderek at Oct 2015
+# by Joerg Jenderek at Oct 2015, Feb 2021
# https://en.wikipedia.org/wiki/COFF
@@ -18,34 +18,40 @@
>18 uleshort&0x8E80 0
->>0 clear x
+# skip DOCTOR.DAILY READER.NDA REDBOX.ROOT by looking for positive number of sections
+>>2 uleshort >0
+# skip ega80woa.fnt svgafix.fnt HP3FNTS1.DAT HP3FNTS2.DAT INTRO.ACT LEARN.PIF by looking for low number of sections
+>>>2 uleshort <4207
+>>>>0 clear x
# f_magic - magic number
# DJGPP, 80386 COFF executable, MS Windows COFF Intel 80386 object file (./intel)
->>0 uleshort 0x014C Intel 80386
+>>>>0 uleshort 0x014C Intel 80386
# Hitachi SH big-endian COFF (./hitachi-sh)
->>0 uleshort 0x0500 Hitachi SH big-endian
+>>>>0 uleshort 0x0500 Hitachi SH big-endian
# Hitachi SH little-endian COFF (./hitachi-sh)
->>0 uleshort 0x0550 Hitachi SH little-endian
+>>>>0 uleshort 0x0550 Hitachi SH little-endian
# executable (RISC System/6000 V3.1) or obj module (./ibm6000)
-#>>0 uleshort 0x01DF
+#>>>>0 uleshort 0x01DF
# MS Windows COFF Intel Itanium, AMD64
# https://msdn.microsoft.com/en-us/library/windows/desktop/ms680313(v=vs.85).aspx
->>0 uleshort 0x0200 Intel ia64
->>0 uleshort 0x8664 Intel amd64
+>>>>0 uleshort 0x0200 Intel ia64
+>>>>0 uleshort 0x8664 Intel amd64
# TODO for other COFFs
-#>>0 uleshort 0xABCD COFF_TEMPLATE
->>0 default x
->>>0 uleshort x type 0x%04x
->>0 uleshort x COFF
+#>>>>0 uleshort 0xABCD COFF_TEMPLATE
+>>>>0 default x
+>>>>>0 uleshort x type 0x%04x
+>>>>0 uleshort x COFF
# F_EXEC flag bit
->>18 leshort ^0x0002 object file
-#!:mime application/x-coff
-#!:ext cof/o/obj/lib
->>18 leshort &0x0002 executable
+>>>>18 leshort ^0x0002 object file
+!:mime application/x-coff
+!:ext o/obj/lib
+# no cof sample found
+#!:ext cof/o/obj/lib
+>>>>18 leshort &0x0002 executable
#!:mime application/x-coffexec
# F_RELFLG flag bit,static object
->>18 leshort &0x0001 \b, no relocation info
+>>>>18 leshort &0x0001 \b, no relocation info
# F_LNNO flag bit
->>18 leshort &0x0004 \b, no line number info
+>>>>18 leshort &0x0004 \b, no line number info
# F_LSYMS flag bit
->>18 leshort &0x0008 \b, stripped
->>18 leshort ^0x0008 \b, not stripped
+>>>>18 leshort &0x0008 \b, stripped
+>>>>18 leshort ^0x0008 \b, not stripped
# flags in other COFF versions
@@ -55,3 +61,3 @@
# F_AR32WR flag bit
-#>>>18 leshort &0x0100 \b, 32 bit little endian
+#>>>>18 leshort &0x0100 \b, 32 bit little endian
#0x1000 F_DYNLOAD
@@ -59,13 +65,15 @@
#0x4000 F_LOADONLY
-# f_nscns - number of sections
->>2 uleshort <2 \b, %d section
->>2 uleshort >1 \b, %d sections
-# f_timdat - file time & date stamp only for little endian
-#>>4 date x \b, %s
+# f_nscns - number of sections like: 1 2 3 4 5 7 8 9 11 12 15 16 19 20 21 22 26 30 36 40 42 56 80 89 96 124
+>>>>2 uleshort <2 \b, %u section
+>>>>2 uleshort >1 \b, %u sections
# f_symptr - symbol table pointer, only for not stripped
->>8 ulelong >0 \b, symbol offset=0x%x
+# like: 0 0x7c 0xf4 0x104 0x182 0x1c2 0x1c6 0x468 0x948 0x416e 0x149a6 0x1c9d8 0x23a68 0x35120 0x7afa0
+>>>>8 ulelong >0 \b, symbol offset=0x%x
# f_nsyms - number of symbols, only for not stripped
->>12 ulelong >0 \b, %d symbols
-# f_opthdr - optional header size
->>16 uleshort >0 \b, optional header size %d
+# like: 0 2 7 9 10 11 20 35 41 63 71 80 105 146 153 158 170 208 294 572 831 1546
+>>>>12 ulelong >0 \b, %d symbols
+# f_opthdr - optional header size. An object file should have a value of 0
+>>>>16 uleshort >0 \b, optional header size %u
+# f_timdat - file time & date stamp only for little endian
+>>>>4 ledate >0 \b, created %s
# at offset 20 can be optional header, extra bytes FILHSZ-20 because
@@ -74,2 +82,5 @@
# additional variables for other COFF files
+>>>>16 uleshort =0
+# first section name s_name[8] like: .text .data .debug$S .drectve .testseg
+>>>>>20 string x \b, 1st section name "%.8s"
# >20 beshort 0407 (impure)
More information about the File
mailing list