[File] [PATCH] of Magdir/lif more tests to avoid misidentification

Jörg Jenderek joerg.jen.der.ek at gmx.net
Sat Feb 13 01:22:22 UTC 2021


Hello,
some days ago i handled some Atari images. The examples with PC1
file name extensions are DEGAS low-res compressed bitmaps.

When running file command version 5.39 on such bitmaps and HP floppy
disc images i get an output like:

atchco.lif:       lif file
disk1.dat:        lif file
DSKA0000_hfe.img: lif file
forth.lif:        lif file
hpcc88.lif:       lif file
hphh01.lif:       lif file
lex90b.lif:       lif file
lexfl1.lif:       lif file
MUNCHIE.PC1:      lif file
SPIDER1.PC1:      lif file
test-lif.img:     lif file
ve71.lif:         lif file
ve75.lif:         lif file

With --extension option only ??? is displayed. Furthermore with -i
option for samples only generic application/octet-stream is shown.

Later i found more disk image examples on FTP server of HP museum
with starting URL:
	ftp://ftp.hpmuseum.org/hpswap/

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). This list the used
file name extension and often with -v option the related URL
pointing to used file format information. (See appended
lif_trid.txt.gz). Luckily TrID tool identifies lif samples as
"HP Logical Interchange Format disk image" and displays related URL
and extensions.

Unfortunately lif files only have a short 2 byte magic at the
beginning. So some other files like DEGAS low-res compressed bitmaps
are misidentified as lif samples because the start magic is the same.
The lines inside Magdir/lif looks like:
0	beshort		0x8000		lif file

So i put the displaying part in a sub routine named lif-file and
looked for additional tests before calling this routine.

Some information about the LIF file format can be found on the HPDir
Project website. This is now expressed inside Magdir/lif by comment
lines like:
 # URL:         https://www.hp9845.net/9845/projects/hpdir/
 # Reference:   https://www.hp9845.net/9845/downloads/manuals/
 #              LIF_excerpt_64941-90906_flpRef_Jan84.pdf

The sub routine start with lines like
 0	name		lif-file
 >0	beshort		x		lif file

Now afterwards i show user defined mime type and file name
extensions by lines like:
 !:mime	application/x-lif-disk
 !:ext	lif/hpi/dat
I also found 1 example with img extension. In the documentation of
hpdir is written that for easier handling, it is a good idea to use
image files with the extension ".hpi" (hp image). The original lif
utilities from Tony Duell use only lif name extension. The enhanced
version by Joachim Siebold use also dat as file name extension.

At offset 2 the volume label is stored. This consist of 6 characters.
Allowed are up cased letter, digits, underline and space character.
So show also this information by line like
 >2	string		x		\b, "%.6s"
This can be also used as a test, but i do not use this because i try
to use the same patterns that are used by TrID utility. So add this
test possibility by comment line like:
 #>>2	ubelong		>0x2020201F

At offset 14 2 unused dummy bytes are stored. According to
documentation this are set to nil. That can be tested by line like
 #>14	beshort		!0		\b, not used 0x%x
This seems to be always be true. So i choose this as second magic
test line. This looks like:
 >14	beshort		=0
So many compressed DEGAS low-res bitmap *.pc1 are skipped.

At offset 20 version is stored at 2 byte integer. According to
documentation this value is 0 for systems without extensions or 1 for
model 64000. So i assume that version is always smaller than 256.
That means the upper byte is nil. That is expressed by displaying
line like:
 >20	ubeshort	x		\b, version %u
I also use this as third test by line like:
 >>20	ubeshort	<0x0100
Now all misidentified examples like MUNCHIE.PC1 BOARD.PC1 ENEMIES.PC1
are skipped.

At offset 12 the LIF identifier is stored as 2 byte integer. In
documentation is only written that this value is octal 010000 for
system 3000. That is the only value find in my examples, but i do not
know if this is always true. Later i also found zero value.So show
this information for other values by line like:
 >12	beshort		!0x1000		\b, LIF identifier 0x%x

At offset 8 the directory start address in units (probably 256 bytes)
is stored as 4 byte big endian integer. Typical value is 2. That
means directory start at 0x200. In theory higher values are possible,
but then there is a gap between header and directory. And on floppies
with only some hundreds of KiB you would not waste hundreds of bytes.
So i assume that this value is low. So show unusual values by lines like:
 >8	ubelong		x		\b, directory
 >8	ubelong		!2		start address %u

At offset 16 the length of directory in units is stored as 4 byte
integer. According to documentation for model 64000 this is 14. For
example DSKA0000_hfe.img this value is 18 and for hex dump based
test-lif.img this value was 24. Highest value i found was 80. This
can also be verified by running lifstat utility. This value is set
when initializing the file system. In theory higher values are
possible, but then space for files content become smaller. And on
floppies with some hundreds of KiB you would not waste hundreds of
bytes. So i assume that this value is always low. That information is
shown by line like:
 >16	ubelong		x		length %u

Afterwards the level 1 extension fields are stored. For version 0
this are unused (nils). So show this information by lines like
 >20	beshort		>0
 >>24	ubequad		!0		\b, extensions 0x%llx...


The words 21-126 are reserved for extensions and future use.
According to documentation this is set to nil. So show unexpected
values by line like
 >42	ubequad		!0		\b, RESERVED 0x%llx

For typical directories show first file name by lines like
 >8	ubelong		2
 >>512	string		<\xff\xff	\b, 1st file %-.10s

After applying the above mentioned modifications by patch
file-5.39-lif.diff, then misidentification of bitmaps vanish and HP
disk images are described more precisely like:

atchco.lif:       lif file "ATCHCO", version 1,
		  directory length 30,
		  extensions 0x4d00000002...,
		  1st file BACH
disk1.dat:        lif file "      ", version 1, LIF identifier 0x0,
		  directory length 20,
		  extensions 0x4d00000002...
DSKA0000_hfe.img: lif file "A165X ", version 0,
		  directory length 18,
		  1st file MIXEDDEMO1
forth.lif:        lif file "FORTH ", version 1,
		  directory length 30,
		  extensions 0x4d00000002...,
		  1st file FORTH10
hpcc88.lif:       lif file "HPCC88", version 1,
		  directory length 12,
		  extensions 0x4d00000002...,
		  1st file MCITFYP4
hphh01.lif:       lif file "HPHH01", version 1,
		  directory length 30,
		  extensions 0x4d00000002...,
		  1st file HPHH2801
lex90b.lif:       lif file "LEX90B", version 1,
		  directory length 2,
		  extensions 0x4d00000002...,
		  1st file LEXFNOTE
lexfl1.lif:       lif file "LEXFL1", version 1,
		  directory length 80,
		  extensions 0x4d00000002...,
		  1st file RWLEX
MUNCHIE.PC1:      data
SPIDER1.PC1:      data
test-lif.img:     lif file "B9826 ", version 0,
		  directory length 24,
		  1st file MIXEDDEMO1
ve71.lif:         lif file "VE7120", version 1,
		  directory length 20,
		  extensions 0x4d00000002...,
		  1st file VEM1C0
ve75.lif:         lif file "VE7530", version 1,
		  directory length 16,
		  extensions 0x4d00000002...,
		  1st file VEM1C0

I will try to do recognition pattern for DEGAS bitmaps in a future
session.

I hope my diff file can be applied in future version of
file utility.

With best wishes
Jörg Jenderek
--
Jörg Jenderek






















-------------- next part --------------
--- file-5.39/magic/Magdir/lif.old	2019-02-22 12:06:34 +0000
+++ file-5.39/magic/Magdir/lif	2021-02-13 01:02:20 +0000
@@ -7,2 +7,43 @@
 #
-0	beshort		0x8000		lif file
+# Modified by:	Joerg Jenderek
+# URL:		https://www.hp9845.net/9845/projects/hpdir/
+#		https://github.com/bug400/lifutils
+# Reference:	https://www.hp9845.net/9845/downloads/manuals/LIF_excerpt_64941-90906_flpRef_Jan84.pdf
+# Note:		called by TrID "HP Logical Interchange Format disk image"
+0	beshort		0x8000
+# GRR: line above is too general as it catches also compressed DEGAS low-res bitmap *.pc1
+# skip many compressed DEGAS low-res bitmap *.pc1 by test for unused bytes
+>14	beshort		=0
+# skip MUNCHIE.PC1 BOARD.PC1 ENEMIES.PC1 by test for low version number
+>>20	ubeshort	<0x0100
+# skip DEGAS MUNCHIE.PC1 BOARD.PC1 ENEMIES.PC1 by test for ASCII like volume name
+#>>>2	ubelong		>0x2020201F
+>>>0	use		lif-file
+0	name		lif-file
+# LIF ID
+>0	beshort		x		lif file
+!:mime	application/x-lif-disk
+# lif used by Tony Duell LIF utilities; enhanced version by Joachim Siebold use also dat; hpi used by hpdir
+!:ext	lif/hpi/dat
+# volume label; A-Z 0-9 _ ; default are 6 spaces
+>2	string		x		"%.6s"
+# version number; 0 for systems without extensions or 1 for model 64000
+>20	ubeshort	x		\b, version %u
+# LIF identifier; 010000 for system 3000
+>12	beshort		!0x1000		\b, LIF identifier 0x%x
+# directory start address in units like: 2
+>8	ubelong		x		\b, directory
+>8	ubelong		!2		start address %u
+# length of directory like: 2 4 7 10 12 14 (for model 64000) 16 18 20 24 30 50 57 77 80
+>16	ubelong		x		length %u
+# level 1 extensions
+>20	beshort		=0
+>>24	ubequad		!0		\b, for extensions 0x%llx...
+>20	beshort		>0
+>>24	ubequad		!0		\b, extensions 0x%llx...
+# word 21-126 reserved for extensions and future use; set to nil
+>42	ubequad		!0		\b, RESERVED 0x%llx
+# lif first file name for standard directory; 0xffff... means uninitialized
+>8	ubelong		2
+>>512	string		<\xff\xff	\b, 1st file %-.10s
+
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lif_trid.txt.gz
Type: application/x-gzip
Size: 386 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20210213/48895536/attachment.bin>


More information about the File mailing list