[File] [PATCH] Magdir/archive for EDI LZSS compressed file *.??_ *.??$ *.LZS

Jörg Jenderek joerg.jen.der.ek at gmx.net
Fri Nov 18 02:28:22 UTC 2022


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

some times ago i installed an old Windows Greenstreet software. In
installation directory are files with underscore as last character of
file name extension.

When running running file command version 5.43 on such compressed
files and the related unpacked files i get an output like:

4WAY.WA$:     data
4WAY.WAW:     RIFF (little-endian) data, WAVE audio,
	      Microsoft PCM, 8 bit, mono 11025 Hz
BOOK01A.IC$:  data
BOOK01A.ICO:  MS Windows icon resource - 1 icon, 32x32, 16 colors
CTL3D.DL$:    data
CTL3D.DLL:    MS-DOS executable, NE for MS Windows 3.x (DLL or font)
GUNSHOT.LZS:  data
GUNSHOT.bmp:  PC bitmap, Windows 3.x format, 335 x 364 x 8,
	      image size 122304, resolution 3543 x 3543 px/m,
	      cbSize 123382, bits offset 1078
HERBTEXT.LZS: data
HERBTEXT.txt: ASCII text, with very long lines (369)
LACERATE.LZS: data
LACERATE.bmp: PC bitmap, Windows 3.x format, 261 x 351 x 8,
	      image size 92664, resolution 2756 x 2756 px/m,
	      cbSize 93742, bits offset 1078
PLANTAIN.LZS: data
SKYMAP.EXE:   MS-DOS executable, NE for MS Windows 3.x (EXE)
SKYMAP.EX_:   data
SPELMATE.H:   C source, ASCII text, with CRLF line terminators
SPELMATE.H$:  data

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). This
identifies some examples with dollar or underscore as last character
like 4WAY.WA$ or SKYMAP.EX_ as "EDI Install Pro LZSS2 compressed
data" by edi-lzss2.trid.xml. The other compressed examples are
described as "EDI Install LZS compressed data" by
ediinstall-lzss1.trid.xml (See appended trid-v-edi.txt.gz).

With the help of TrID out put i found pages on file formats archive
team web site. That informations are expressed by comment lines like:
# URL:		http://fileformats.archiveteam.org/wiki/
#		EDI_Install_packed_file
#		EDI_LZSSLib
# Reference:	http://mark0.net/download/triddefs_xml.7z
#		/defs/e/ediinstall-lzss1.trid.xml
#		/defs/e/edi-lzss2.trid.xml

The compressed data format is similar or identical to Okumura's LZSS.
So i add inside Magdir/archive lines after that LZSS compressed
archive section.

According to documentation side i add magic lines like:
 0	string		EDILZSS
 >7	string		2
 !:mime	application/x-edi-pack-lzss
 !:ext	??$/??_
 >>8	string		x		"%-0.13s"
 >>21	ulelong		x		\b, %u bytes
 >>>25	ubequad		x		\b, data %#llx...
After the 8-byte signature EDILZSS2 , the original NIL-terminated
filename ( like 4way.wav skymap.exe) padded to 13 bytes is stored.
Afterwards the original file size is stored as a 4-byte integer. That
is followed by compressed data. Instead of generic mime type
application/octet-stream i show an user defined one. The name of a
compressed file often ends in character '$' or '_'.

Then there exist '1'-variant . There the start magic is 8-byte
signature EDILZSS1. There the file size field is missing. I must
put displaying part inside sub routine edi-pack. That looks like:
 0	name		edi-pack
 >8	string		x		EDI LZSS packed "%-.13s"
 !:mime	application/x-edi-pack-lzss
 !:ext	??$/?$
 >21	ubequad		x		\b, data %#16.16llx...
That variant is described as "EDI Pack LZSS1" by mentioned software
deark. That can be verified by running command like:
	deark -l -d2 SPELMATE.H$

Unfortunately there exist a third variant. There the original file
name field is missing. And there in my inspected examples the suffix
LSZ was used. That variant is described as "EDI LZSSLib" by
mentioned software deark. That can be verified by running command lik
e:
	deark -l -d2 GUNSHOT.LZS
Unfortunately i was not able to express this as regular
expression, because then sample HERBTEXT.LZS is misidentified. So i
put displaying part in sub routine edi-lzs. This looks like:
 0	name		edi-lzs
 >8	string		x		EDI LZSSLib packed
 !:mime	application/x-edi-pack-lzss
 !:ext	lzs
 >8	ubequad		x		\b, data %#16.16llx...

Instead of regular expression is use a bunch of test lines. That
look like:
 0	string					EDILZSS
 >7	string					1
 >>8	search/9/b				.
 >>>&0		ubyte				<0x20
 >>>>0			use				edi-lzs
 >>>&0		ubyte				>0x1F
 >>>>&0			ubyte			=0
 >>>>>0				use			edi-pack
 >>>>&0			ubyte			>0x1F
 >>>>>&0				ubyte		=0
 >>>>>>0					use	edi-pack
 >>>>>&0				ubyte		>0x1F
 >>>>>>&0				ubyte	=0
 >>>>>>>0					use	edi-pack
 >>>>>>&0				ubyte	!0
 >>>>>>>0					use	edi-lzs
 >>>>>&0				default		x
 >>>>>>0					use	edi-lzs
 >>>>&0			default			x
 >>>>>0	use						edi-lzs
 >>8	default					x
 >>>0		use					edi-lzs
So i look for point character before original file name extension
in possible 13 byte name field. If i found no point it must be be
LSZ variant. If i found point character i inspect character of
possible suffix part. If this is nil then is the file name
terminator and it is pack variant. If that value is "low" than it
is "no valid" file name. This must be LZS variant. If that value is
"high" i must inspect next character by same procedure. This must
be repeated until the maximal length of file name suffix (that is
3) is reached.

After applying the above mentioned modifications by patch
file-5.43-archive-edi.diff and using Magdir/msdos,images,riff then
all such inspected EDI LZSS compressed files are now described. This
now looks like:

4WAY.WA$:     EDI install LZSS2 packed
	      "4way.wav",
	      60430 bytes,
	      data 0xff5249464606ec00...
4WAY.WAW:     RIFF (little-endian) data, WAVE audio,
	      Microsoft PCM, 8 bit, mono 11025 Hz
BOOK01A.IC$:  EDI LZSS packed
	      "book01a.ico",
	      data 0xf7000001eff02020...
BOOK01A.ICO:  MS Windows icon resource - 1 icon, 32x32, 16 colors
CTL3D.DL$:    EDI LZSS packed
	      "ctl3d.dll",
	      data 0xff4d5aa900020000...
CTL3D.DLL:    MS-DOS executable, NE for MS Windows 3.x (DLL or font)
GUNSHOT.LZS:  EDI LZSSLib packed
	      data 0xbf424df6e10100f3...
GUNSHOT.bmp:  PC bitmap, Windows 3.x format, 335 x 364 x 8,
	      image size 122304, resolution 3543 x 3543 px/m,
	      cbSize 123382, bits offset 1078
HERBTEXT.LZS: EDI LZSSLib packed
	      data 0xff416c6f652e6c7a...
HERBTEXT.txt: ASCII text, with very long lines (369)
LACERATE.LZS: EDI LZSSLib packed
	      data 0xbf424d2e6e0100f3...
LACERATE.bmp: PC bitmap, Windows 3.x format, 261 x 351 x 8,
	      image size 92664, resolution 2756 x 2756 px/m,
	      cbSize 93742, bits offset 1078
PLANTAIN.LZS: EDI LZSSLib packed
	      data 0xbf424d962e0100f3...
SKYMAP.EXE:   MS-DOS executable, NE for MS Windows 3.x (EXE)
SKYMAP.EX_:   EDI install LZSS2 packed
	      "skymap.exe", 576032 bytes,
	      data 0xff4d5aa601010000...
SPELMATE.H:   ASCII text, with CRLF line terminators
SPELMATE.H$:  EDI LZSS packed
	      "spelmate.h",
	      data 0xff2f2a207370656c...

I hope my diff file can be applied in future version of file utility.

With best wishes
Jörg Jenderek
- --
Jörg Jenderek



-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCY3btxQAKCRCv8rHJQhrU
1lo/AJoC6tcfma1nfbLIo0HRzLgDqUk5qACfZ9ElsRcq2lu4mTRvcFdGrj6MTOQ=
=oBQp
-----END PGP SIGNATURE-----
-------------- next part --------------
--- file-5.43/magic/Magdir/archive.old	2022-09-13 20:05:39.000000000 +0200
+++ file-5.43/magic/Magdir/archive	2022-11-18 02:31:07.096884400 +0100
@@ -753,6 +753,88 @@
 !:ext	??$
 >>8	ulelong	>0		\b, original size: %u bytes
 
+# Summary:	lzss compressed/EDI Pack
+# From:		Joerg Jenderek
+# URL:		http://fileformats.archiveteam.org/wiki/EDI_Install_packed_file
+# Note:		called "EDI Install LZS compressed data" by TrID and verified by
+#		command like `deark -l -m edi_pack -d2 BOOK01A.IC$` as "EDI Pack LZSS1"
+0	string					EDILZSS
+>7	string					1
+# look for point character before orginal file name extension
+>>8	search/9/b				.
+# check suffix of possible orginal file anme
+#>>>&0		ubelong				x	SUFFIX=%8.8x
+# samples without valid character after point in original file name field like: FENNEL.LZS PLANTAIN.LZS
+>>>&0		ubyte				<0x20
+>>>>0			use				edi-lzs
+# samples with valid character after point in original file name field
+>>>&0		ubyte				>0x1F
+# check 2nd charcter of suffix
+#>>>>&0			ubyte	x			2ND_SUFFIX=%x
+# sample with one valid character after point followed by \0 in original file name field like: SPELMATE.H$
+>>>>&0			ubyte			=0
+>>>>>0				use			edi-pack
+>>>>&0			ubyte			>0x1F
+# check 3rd charcter of suffix
+#>>>>>&0				ubyte		x	3RD_SUFFIX=%x
+# no sample with 2 valid characters after point followed by \0 in original file name field
+>>>>>&0				ubyte		=0
+>>>>>>0					use		edi-pack
+# samples with valid 3rd character after point in original file name field
+>>>>>&0				ubyte		>0x1F
+# sample with 3 valid character after point followed by \0 in original file name field like: BOOK01A.IC$ CTL3D.DL$
+>>>>>>&0				ubyte	=0
+>>>>>>>0					use	edi-pack
+# sample with 3 valid character after point followed by no \0 in original file name field like: HERBTEXT.LZS
+>>>>>>&0				ubyte	!0
+>>>>>>>0					use	edi-lzs
+# no sample with invalid 3rd character after point in original file name field
+>>>>>&0				default		x
+>>>>>>0					use		edi-lzs
+# sample with invalid 2nd character after point in original file name field like: LACERATE.LZS SPLINTER.LZS
+>>>>&0			default			x
+>>>>>0	use						edi-lzs
+# sample without point character in original file name field like GUNSHOT.LZS
+>>8	default					x
+>>>0		use					edi-lzs
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/e/edi-lzss2.trid.xml
+# Note:		called "EDI Install Pro LZSS2 compressed data" by TrID and verified by
+#		command like `deark -l -m edi_pack -d2 4WAY.WA$` as "EDI Pack LZSS2"
+>7	string			2			EDI LZSS2 packed
+#!:mime	application/octet-stream
+!:mime	application/x-edi-pack-lzss
+# the name of a compressed file often ends in character '$' or '_'
+!:ext	??$/??_
+# original filename, NUL-terminated, padded to 13 bytes like: mci.vbx 4way.wav skymap.exe cmdialog.vbx
+>>8		string		x			"%-0.13s"
+# original file size, as a 4-byte integer.
+>>21		ulelong		x			\b, %u bytes
+# compressed data like: ff5249464606ec00 ff4d5aa601010000
+>>>25		ubequad		x			\b, data %#16.16llx...
+0	name		edi-pack
+# Note:		verified by command like `deark -l -d2 SPELMATE.H$` as "EDI Pack LZSS1"
+# original filename, NUL-terminated, padded to 13 bytes like: ctl3d.dll spelmate.h filemenu.rc owl.def index-it.exe
+# but not like \377Aloe.lzs\273 (HERBTEXT.LZS)
+>8	string		x				EDI LZSS packed "%-.13s"
+#!:mime	application/octet-stream
+!:mime	application/x-edi-pack-lzss
+# the name of a compressed file often ends in character '$' or '_'
+!:ext	??$/?$
+# compressed data like: f7000001eff02020 ff4d5aa900020000 ff2f2a207370656c
+>21	ubequad		x				\b, data %#16.16llx...
+# URL:		http://fileformats.archiveteam.org/wiki/EDI_LZSSLib
+# Note:		verified partly by command like `deark -l -m edi_pack -d2 GUNSHOT.LZS` as "EDI LZSSLib"
+0	name		edi-lzs
+# Note:		verified by command like `deark -l -d2 GUNSHOT.LZS` as "EDI LZSSLib"
+# no original filename looks like: \277BM\226.\0 \277BM.n\001 \277BM\226.\0 \277BM.g\001 \377Aloe.lzs\273
+>8	string		x				EDI LZSSLib packed
+#!:mime	application/octet-stream
+!:mime	application/x-edi-pack-lzss
+# The name of a compressed file ends with LZS suffix
+!:ext	lzs
+# compressed data like: bf424df6e10100f3 ff416c6f652e6c7a ff416c6f652e6c7a
+>8	ubequad		x				\b, data %#16.16llx...
+
 # Summary:	CAZIP compressed file
 # From:		Joerg Jenderek
 # URL:		http://fileformats.archiveteam.org/wiki/CAZIP
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.43-archive-edi.diff.sig
Type: application/octet-stream
Size: 1653 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20221118/4f7eb3f2/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-edi.txt.gz
Type: application/x-gzip
Size: 1529 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20221118/4f7eb3f2/attachment.bin>


More information about the File mailing list