[File] [PATCH] of Magdir/coff more tests to avoid misidentification

Jörg Jenderek joerg.jen.der.ek at gmx.net
Wed Feb 3 23:52:22 UTC 2021


Hello,
some days ago i handled some Windows font with fnt file name
extension. When running file command version 5.39 on such fonts, COFF
samples and other test files i get an output like:

bindings.o:       Intel 80386 COFF object file,
		  no line number info, not stripped,
		  96 sections, symbol offset=0x186c0,
		  572 symbols
coff2.o:          Intel 80386 COFF object file,
		  no line number info, not stripped,
		  1 section, symbol offset=0x7c,
		  7 symbols
DOCTOR.DAILY:     Hitachi SH big-endian COFF object file,
		  not stripped,
		  0 section, symbol offset=0xcb060000,
		  656385 symbols
ega80woa.fnt:     Intel ia64 COFF object file,
		  no line number info, not stripped,
		  4207 sections, symbol offset=0x6f432029,
		  1769109872 symbols, optional header size 26727
fda2.obj:         Intel 80386 COFF object file,
		  not stripped,
		  4 sections, symbol offset=0x2468, 153 symbols
intel-stripped.o: Intel 80386 COFF object file,
		  no relocation info, no line number info, stripped,
		  2 sections
msvcrt.lib:       Intel 80386 COFF object file,
		  not stripped,
		  124 sections, symbol offset=0xc41f,
		  597 symbols
rsrc.obj:         Intel 80386 COFF object file,
		  not stripped,
		  3 sections, symbol offset=0x1516,
		  11 symbols
svgafix.fnt:      Intel ia64 COFF object file,
		  no line number info, not stripped,
		  4627 sections, symbol offset=0x6f432029,
		  1769109872 symbols, optional header size 26727

With --extension option only ??? is displayed. Furthermore with -i
option for samples only generic application/octet-stream is shown.

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). This list the used
file name extension and often with -v option the related URL
pointing to used file format information. (See appended
coff_trid.txt.gz). Luckily TrID tool identifies COFF samples as
"Intel 80386 Common Object File Format (COFF) object" and displays
related URL and extensions.

Unfortunately COFF files only have a short 2 byte magic at the
beginning. So some other files are misidentified as COFF samples. But
luckily the displaying part is done by sub routine display-coff
inside Magdir/coff. So the start is triggered by lines inside
hitachi-sh and intel magic files by lines like:
 0	leshort		=0514
 >0	use				display-coff

This sub routine starts with lines like:
 0	name				display-coff
 >18	uleshort&0x8E80	0
 >>0	clear		x

Before displaying the COFF information i add additional test for
unused flag bits (0x8000,0x0800,0x0400,0x0200,x0080) in f_flags at
Oct 2015. So more additional test lines are needed here.

Later the number of sections f_nscns are show by lines like:
 >>2	uleshort	<2		\b, %d section
 >>2	uleshort	>1		\b, %d sections
After testing some hundreds COFF examples i get f_nscns values like:
1 2 3 4 5 7 8 9 11 12 16 19 20 21 22 30 36 40 42 56 80 89 96 124
So real COFF samples have at least 1 section. This is used as second
magic test line like:
 >>2	uleshort	>0
So misidentified examples like DOCTOR.DAILY READER.NDA REDBOX.ROOT
are skipped by looking for positive number of sections.
Typically COFF samples have only a few sections for code, data etc.
The worst case with highest f_nscns value 124 was msvcrt.lib. So i
assume that real maximal f_nscns is in hundreds range. So i add
additional third magic test line like:
 >>>2	uleshort	<4207
 >>>>0	clear		x
So misidentified examples like ega80woa.fnt svgafix.fnt HP3FNTS1.DAT
HP3FNTS2.DAT INTRO.ACT LEARN.PIF are skipped by looking for low
number of sections.
Considering unsignment the number of sections are now become like:
 >>>>2	uleshort	<2		\b, %u section
 >>>>2	uleshort	>1		\b, %u sections

If the F_EXEC bit is not set, the file is not considered executable.
So it is an object file. This now becomes visible by line like:
 >>>>18	leshort		^0x0002		object file
Now afterwards i show user defined mime type and file name
extensions by lines like:
 !:mime	application/x-coff
 !:ext	o/obj/lib
Unfortunately i found no examples with cof extension.

I also added observed values for f_symptr and f_nsyms as comment
lines. These are relative "low" whereas for misidentified examples
i find extreme "high" values. So these values maybe can also be used
in magic test lines if needed.

Furthermore after showing COFF values like in old version i also
display creation time f_timdat by line like
 >>>>4	ledate		>0		\b, created %s
This is true for little endian, which i tested, but i do not know if
this is also true for big endian.

The optional header size f_opthdr is shown if available by line like:
 >>>>16	uleshort	>0		\b, optional header size %u
An object file should have a value of 0. So i found no example with a
non zero optional header. So for such cases after the COFF header the
section header starts. At the beginning a 8 byte section name ( like
.text .data .debug$S .drectve .testseg ) is stored. So this
information is now shown by additional lines like:
 >>>>16	uleshort	=0
 >>>>>20	string	x		\b, 1st section name "%.8s"
It seems that the section name always start with a point character
(0x2E). So maybe this can also be used as magic test.

After applying the above mentioned modifications by patches
file-5.39-coff.diff, then misidentification vanish and COFF samples
are described more precisely like:

bindings.o:       Intel 80386 COFF object file,
		  no line number info, not stripped,
		  96 sections, symbol offset=0x186c0,
		  572 symbols,
		  1st section name ".text"
coff2.o:          Intel 80386 COFF object file,
		  no line number info, not stripped,
		  1 section, symbol offset=0x7c,
		  7 symbols, created Mon Apr 17 07:22:41 2006,
		  1st section name ".data"
DOCTOR.DAILY:     data
ega80woa.fnt:     data
fda2.obj:         Intel 80386 COFF object file,
		  not stripped,
		  4 sections, symbol offset=0x2468,
		  153 symbols, created Fri Nov 25 03:25:11 2011,
		  1st section name ".drectve"
intel-stripped.o: Intel 80386 COFF object file,
		  no relocation info, no line number info, stripped,
		  2 sections,
		  1st section name ".testseg"
rsrc.obj:         Intel 80386 COFF object file,
		  not stripped,
		  3 sections, symbol offset=0x1516,
		  11 symbols, created Mon Oct 03 06:04:18 2011,
		  1st section name ".debug$S"
svgafix.fnt:      data

I hope my diff file can be applied in future version of
file utility.

With best wishes
Jörg Jenderek
--
Jörg Jenderek
















-------------- next part --------------
A non-text attachment was scrubbed...
Name: coff_trid.txt.gz
Type: application/x-gzip
Size: 839 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20210204/0ba70ecd/attachment.bin>
-------------- next part --------------
--- file-5.39/magic/Magdir/coff.old	2018-08-01 10:34:03 +0000
+++ file-5.39/magic/Magdir/coff	2021-02-03 20:00:18 +0000
@@ -7,3 +7,3 @@
 #
-# by Joerg Jenderek at Oct 2015
+# by Joerg Jenderek at Oct 2015, Feb 2021
 # https://en.wikipedia.org/wiki/COFF
@@ -18,34 +18,40 @@
 >18	uleshort&0x8E80	0
->>0	clear		x
+# skip DOCTOR.DAILY READER.NDA REDBOX.ROOT by looking for positive number of sections
+>>2	uleshort	>0
+# skip ega80woa.fnt svgafix.fnt HP3FNTS1.DAT HP3FNTS2.DAT INTRO.ACT LEARN.PIF by looking for low number of sections
+>>>2	uleshort	<4207
+>>>>0	clear		x
 # f_magic - magic number
 # DJGPP, 80386 COFF executable, MS Windows COFF Intel 80386 object file (./intel)
->>0	uleshort	0x014C		Intel 80386
+>>>>0	uleshort	0x014C		Intel 80386
 # Hitachi SH big-endian COFF (./hitachi-sh)
->>0	uleshort	0x0500		Hitachi SH big-endian
+>>>>0	uleshort	0x0500		Hitachi SH big-endian
 # Hitachi SH little-endian COFF (./hitachi-sh)
->>0	uleshort	0x0550		Hitachi SH little-endian
+>>>>0	uleshort	0x0550		Hitachi SH little-endian
 # executable (RISC System/6000 V3.1) or obj module (./ibm6000)
-#>>0	uleshort	0x01DF
+#>>>>0	uleshort	0x01DF
 # MS Windows COFF Intel Itanium, AMD64
 # https://msdn.microsoft.com/en-us/library/windows/desktop/ms680313(v=vs.85).aspx
->>0	uleshort	0x0200		Intel ia64
->>0	uleshort	0x8664		Intel amd64
+>>>>0	uleshort	0x0200		Intel ia64
+>>>>0	uleshort	0x8664		Intel amd64
 # TODO for other COFFs
-#>>0	uleshort	0xABCD		COFF_TEMPLATE
->>0	default		x
->>>0	uleshort	x		type 0x%04x
->>0	uleshort	x		COFF
+#>>>>0	uleshort	0xABCD		COFF_TEMPLATE
+>>>>0	default		x
+>>>>>0	uleshort	x		type 0x%04x
+>>>>0	uleshort	x		COFF
 # F_EXEC flag bit
->>18	leshort		^0x0002		object file
-#!:mime	application/x-coff
-#!:ext cof/o/obj/lib
->>18	leshort		&0x0002		executable
+>>>>18	leshort		^0x0002		object file
+!:mime	application/x-coff
+!:ext	o/obj/lib
+# no cof sample found
+#!:ext	cof/o/obj/lib
+>>>>18	leshort		&0x0002		executable
 #!:mime	application/x-coffexec
 # F_RELFLG flag bit,static object
->>18	leshort		&0x0001		\b, no relocation info
+>>>>18	leshort		&0x0001		\b, no relocation info
 # F_LNNO flag bit
->>18	leshort		&0x0004		\b, no line number info
+>>>>18	leshort		&0x0004		\b, no line number info
 # F_LSYMS flag bit
->>18	leshort		&0x0008		\b, stripped
->>18	leshort		^0x0008		\b, not stripped
+>>>>18	leshort		&0x0008		\b, stripped
+>>>>18	leshort		^0x0008		\b, not stripped
 # flags in other COFF versions
@@ -55,3 +61,3 @@
 # F_AR32WR flag bit
-#>>>18	leshort		&0x0100		\b, 32 bit little endian
+#>>>>18	leshort		&0x0100		\b, 32 bit little endian
 #0x1000    F_DYNLOAD
@@ -59,13 +65,15 @@
 #0x4000    F_LOADONLY
-# f_nscns - number of sections
->>2	uleshort	<2		\b, %d section
->>2	uleshort	>1		\b, %d sections
-# f_timdat - file time & date stamp only for little endian
-#>>4	date		x		\b, %s
+# f_nscns - number of sections like: 1 2 3 4 5 7 8 9 11 12 15 16 19 20 21 22 26 30 36 40 42 56 80 89 96 124
+>>>>2	uleshort	<2		\b, %u section
+>>>>2	uleshort	>1		\b, %u sections
 # f_symptr - symbol table pointer, only for not stripped
->>8	ulelong		>0		\b, symbol offset=0x%x
+# like: 0 0x7c 0xf4 0x104 0x182 0x1c2 0x1c6 0x468 0x948 0x416e 0x149a6 0x1c9d8 0x23a68 0x35120 0x7afa0
+>>>>8	ulelong		>0		\b, symbol offset=0x%x
 # f_nsyms - number of symbols, only for not stripped
->>12	ulelong		>0		\b, %d symbols
-# f_opthdr - optional header size
->>16	uleshort	>0		\b, optional header size %d
+# like: 0 2 7 9 10 11 20 35 41 63 71 80 105 146 153 158 170 208 294 572 831 1546
+>>>>12	ulelong		>0		\b, %d symbols
+# f_opthdr - optional header size. An object file should have a value of 0
+>>>>16	uleshort	>0		\b, optional header size %u
+# f_timdat - file time & date stamp only for little endian
+>>>>4	ledate		>0		\b, created %s
 # at offset 20 can be optional header, extra bytes FILHSZ-20 because
@@ -74,2 +82,5 @@
 # additional variables for other COFF files
+>>>>16	uleshort	=0
+# first section name s_name[8] like: .text .data .debug$S .drectve .testseg
+>>>>>20	string		x		\b, 1st section name "%.8s"
 # >20	beshort		0407		(impure)


More information about the File mailing list