[File] [PATCH] of Magdir/xenix for 8086 relocatable (Microsoft) *.obj; 5th test too strict

Jörg Jenderek joerg.jen.der.ek at gmx.net
Mon Mar 29 19:50:47 UTC 2021


Hello,
some time ago ago i send a patch for Magdir/xenix to recognise 8086
relocatables more reliable. Now i run file command version 5.39 on
additional samples and negative test samples and i get an output like:

CWAPI.OBJ:    data
FRACTAL.GEN:  data
GENA.SND:     data
GNUCHESS.PC1: lif file
hpcc88.lif:   lif file
JMPPM32.OBJ:  data
KBHITR.OBJ:   data
KBHITS.OBJ:   data
OMBRE.6:      lif file
SHOWEVAR.OBJ: 8086 relocatable (Microsoft), "showevar.ASM"
SHR.View:     data
Strange.Pic:  lif file
Switch.Snd:   data
VESA.OBJ:     8086 relocatable (Microsoft), "vesa.asm"
Xtable.Data:  data

Unfortunately some examples like CWAPI.OBJ, JMPPM32.OBJ, KBHITR.OBJ
and KBHITS.OBJ are not recognized.

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). This list the used
file name extension and often with -v option the related URL
pointing to used file format information. (See appended
8086-obj-trid.txt.gz). Luckily TrID tool identifies samples as
"OMF - Relocatable Object Module Format" and displays related URL
and extensions.

So i add mentioned URL inside Magdir/xenix by comment line like
#      http://fileformats.archiveteam.org/wiki/OMF

According to that information instead 1 byte file name extension "o"
also 3 byte extension "obj" for object modules can occur. This is now
displayed by updated line like:
   !:ext	obj/o/a

As first test i looked for the record type at the beginning. This is
for the first record a Translator Header Record with THEADR value
(80h). This test line looks like:
     0	byte		0x80
Unfortunately this is also true for all compressed DEGAS low-res
bitmaps (like MUNCHIE.PC1 PIDER1.PC1) and all lif files like
forth.lif, hpcc88.lif, lex90b.lif ( handled by Magdir/lif). So more
tests were needed.

At offset 1 the length of the the first record data is stored as 2
byte integer in little endian format. The highest unsigned value is
65535. According to documentation the maximum size of the entire
record (unless otherwise noted for specific record types) is 1024
bytes. Without the leading 3 bytes this gives an upper limit of 1021
for the the length of the record data. This is used as second and
third test by lines like
     >1	uleshort	<1022
     >>1	uleshort	>0

When looking in TrID definition obj_omf.trid.xml for real examples
this length is lower 256. So for control reason now show this
information after identification by additional line like:
     >>>>>1	uleshort	x	\b, 1st record data length %u

With this information it is possible to jump to the second record.
For control reasons show the second record type and data length of
the second record by additional lines like:
     >>>>>(1.s+3)	byte	x	\b, 2nd record type 0x%x
     >>>>>(1.s+4)	byte	x	\b, 2nd record data length %u

Often the second type is 96h like in example VESA.OBJ. This value
means that record is of LNAMES type. The LNAMES record is a list of
names that can be referenced by subsequent SEGDEF and GRPDEF records.
I also found second type value 88h like in example SHOWEVAR.OBJ. This
means that the record is a comment record.
I also found value 8Ch like in example CWAPI.OBJ. That means second
record is of EXTDEF type. The EXTDEF record contains a list of
symbolic external references.
I will use that information later by an new additional test line.

At offset 3 the module name is stored as a Pascal string ( First
string length byte). This is shown by line like:
     >>>>>3	pstring		x		\b, "%s"

Often this is the source name like "hello.c". This information was
was used to skip misidentified example OMBRE.6 with "UUUUUU" name by
fifth test line like:
   >>>>4	regex	[a-zA-Z_/]{1,8}[.]	8086 relocatable (Microsoft)

This test failed for JMPPM32.OBJ with source name "jmppm32.asm". Here
the source name also contains some digits. Unfortunately this
T-module name is not always the source name. The name may be
specified directly by the programmer (for example, TITLE
pseudo-operand or assembler NAME directive). So in example KBHITS.OBJ
name is "kbhit" and in example CWAPI.OBJ name is "CAUSEWAY_KERNAL".
So i decide not to check for this name any more. Instead i use
information about second record type as replacement test.

Now i skip bad examples like OMBRE.6 and GNUCHESS.PC1 by looking for
valid "high" second record type via fifth test line like
   >>>>(1.s+3)	ubyte	>0x6D	8086 relocatable (Microsoft)
With this 5 test lines now all my misidentified samples are skipped.

After applying the above mentioned modifications by patch
file-5.39-xenix-8086-obj.diff then all my inspected examples are
described correctly with -M Magdir/xenix option like:

CWAPI.OBJ:    8086 relocatable (Microsoft), "CAUSEWAY_KERNAL",
	      1st record data length 17,
	      2nd record type 0x8c, 2nd record data length 20
FRACTAL.GEN:  data
GENA.SND:     data
GNUCHESS.PC1: data
hpcc88.lif:   data
JMPPM32.OBJ:  8086 relocatable (Microsoft), "jmppm32.asm",
	      1st record data length 13,
	      2nd record type 0x96, 2nd record data length 355
KBHITR.OBJ:   8086 relocatable (Microsoft), "kbhit",
	      1st record data length 7,
	      2nd record type 0x96, 2nd record data length 53
KBHITS.OBJ:   8086 relocatable (Microsoft), "kbhit",
	      1st record data length 7,
	      2nd record type 0x96, 2nd record data length 53
OMBRE.6:      data
SHOWEVAR.OBJ: 8086 relocatable (Microsoft), "showevar.ASM",
	      1st record data length 14,
	      2nd record type 0x88, 2nd record data length 31
SHR.View:     data
Strange.Pic:  data
Switch.Snd:   data
VESA.OBJ:     8086 relocatable (Microsoft), "vesa.asm",
	      1st record data length 10,
	      2nd record type 0x96, 2nd record data length 53
Xtable.Data:  data

I hope my diff file can be applied in future version of file utility.

With best wishes
Jörg Jenderek
--
Jörg Jenderek





-------------- next part --------------
--- file-5.39/magic/Magdir/xenix.old	2020-05-31 10:34:41 +0000
+++ file-5.39/magic/Magdir/xenix	2021-03-29 14:06:56 +0000
@@ -15,2 +15,3 @@
 # URL: http://www.polarhome.com/service/man/?qf=86rel&tf=2&of=Xenix
+#      http://fileformats.archiveteam.org/wiki/OMF
 # Reference: http://www.azillionmonkeys.com/qed/Omfg.pdf
@@ -19,17 +20,26 @@
 0	byte		0x80
-# GRR: line above is too general as it catches also Extensible storage engine DataBase
+# GRR: line above is too general as it catches also Extensible storage engine DataBase,
+# all lif files like forth.lif hpcc88.lif lex90b.lif ( See ./lif)
+# and all compressed DEGAS low-res bitmaps like: MUNCHIE.PC1 PIDER1.PC1
 # skip examples like GENA.SND Switch.Snd by looking for record length maximal 1024-3
 >1	uleshort	<1022
-# skip examples like GAME.PICTURE Strange.Pic by looking for positiv record length
+# skip examples like GAME.PICTURE Strange.Pic by looking for positive record length
 >>1	uleshort	>0
-# skip examples like Xtable.Data FRACTAL.GEN SHR.VIEW by looking for positiv string length
+# skip examples like Xtable.Data FRACTAL.GEN SHR.VIEW by looking for positive string length
 >>>3	ubyte		>0
-# skip examples like OMBRE.6 with "UUUUUU" by looking for filename like "hello.c"
->>>>4	regex	[a-zA-Z_/]{1,8}[.]	8086 relocatable (Microsoft)
+# skip examples like OMBRE.6 with "UUUUUU" name by looking for valid high second record type
+>>>>(1.s+3)	ubyte	>0x6D	8086 relocatable (Microsoft)
 #!:mime	application/octet-stream
 !:mime	application/x-object
-!:ext	o/a
+!:ext	obj/o/a
+# T-module name often source name like "hello.c" or "jmppm32.asm" in JMPPM32.OBJ or
+# "kbhit" in KBHITS.OBJ or "CAUSEWAY_KERNAL" in CWAPI.OBJ
 >>>>>3	pstring		x		\b, "%s"
+# data length probably lower 256 according to TrID obj_omf.trid.xml
+>>>>>1	uleshort	x		\b, 1st record data length %u
 # checksum
 #>>>>>(3.b+4)	ubyte	x		\b, checksum 0x%2.2x
+# second recordtype: 96h~LNAMES 88h~COMENT 8CH~EXTDEF
+>>>>>(1.s+3)	ubyte	x		\b, 2nd record type 0x%x
+>>>>>(1.s+4)	uleshort x		\b, 2nd record data length %u
 0	leshort		0xff65		x.out
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.39-xenix-8086-obj.diff.sig
Type: application/octet-stream
Size: 1194 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20210329/170183e1/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 8086-obj-trid.txt.gz
Type: application/x-gzip
Size: 777 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20210329/170183e1/attachment.bin>


More information about the File mailing list