[File] [PATCH] Magdir/archive for EDI LZSS compressed file *.??_ *.??$ *.LZS

Christos Zoulas christos at zoulas.com
Fri Nov 18 15:57:13 UTC 2022


Committed, thanks!

christos

> On Nov 17, 2022, at 9:28 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hello,
> 
> some times ago i installed an old Windows Greenstreet software. In
> installation directory are files with underscore as last character of
> file name extension.
> 
> When running running file command version 5.43 on such compressed
> files and the related unpacked files i get an output like:
> 
> 4WAY.WA$:     data
> 4WAY.WAW:     RIFF (little-endian) data, WAVE audio,
> 	      Microsoft PCM, 8 bit, mono 11025 Hz
> BOOK01A.IC$:  data
> BOOK01A.ICO:  MS Windows icon resource - 1 icon, 32x32, 16 colors
> CTL3D.DL$:    data
> CTL3D.DLL:    MS-DOS executable, NE for MS Windows 3.x (DLL or font)
> GUNSHOT.LZS:  data
> GUNSHOT.bmp:  PC bitmap, Windows 3.x format, 335 x 364 x 8,
> 	      image size 122304, resolution 3543 x 3543 px/m,
> 	      cbSize 123382, bits offset 1078
> HERBTEXT.LZS: data
> HERBTEXT.txt: ASCII text, with very long lines (369)
> LACERATE.LZS: data
> LACERATE.bmp: PC bitmap, Windows 3.x format, 261 x 351 x 8,
> 	      image size 92664, resolution 2756 x 2756 px/m,
> 	      cbSize 93742, bits offset 1078
> PLANTAIN.LZS: data
> SKYMAP.EXE:   MS-DOS executable, NE for MS Windows 3.x (EXE)
> SKYMAP.EX_:   data
> SPELMATE.H:   C source, ASCII text, with CRLF line terminators
> SPELMATE.H$:  data
> 
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). This
> identifies some examples with dollar or underscore as last character
> like 4WAY.WA$ or SKYMAP.EX_ as "EDI Install Pro LZSS2 compressed
> data" by edi-lzss2.trid.xml. The other compressed examples are
> described as "EDI Install LZS compressed data" by
> ediinstall-lzss1.trid.xml (See appended trid-v-edi.txt.gz).
> 
> With the help of TrID out put i found pages on file formats archive
> team web site. That informations are expressed by comment lines like:
> # URL:		http://fileformats.archiveteam.org/wiki/
> #		EDI_Install_packed_file
> #		EDI_LZSSLib
> # Reference:	http://mark0.net/download/triddefs_xml.7z
> #		/defs/e/ediinstall-lzss1.trid.xml
> #		/defs/e/edi-lzss2.trid.xml
> 
> The compressed data format is similar or identical to Okumura's LZSS.
> So i add inside Magdir/archive lines after that LZSS compressed
> archive section.
> 
> According to documentation side i add magic lines like:
> 0	string		EDILZSS
>> 7	string		2
> !:mime	application/x-edi-pack-lzss
> !:ext	??$/??_
>>> 8	string		x		"%-0.13s"
>>> 21	ulelong		x		\b, %u bytes
>>>> 25	ubequad		x		\b, data %#llx...
> After the 8-byte signature EDILZSS2 , the original NIL-terminated
> filename ( like 4way.wav skymap.exe) padded to 13 bytes is stored.
> Afterwards the original file size is stored as a 4-byte integer. That
> is followed by compressed data. Instead of generic mime type
> application/octet-stream i show an user defined one. The name of a
> compressed file often ends in character '$' or '_'.
> 
> Then there exist '1'-variant . There the start magic is 8-byte
> signature EDILZSS1. There the file size field is missing. I must
> put displaying part inside sub routine edi-pack. That looks like:
> 0	name		edi-pack
>> 8	string		x		EDI LZSS packed "%-.13s"
> !:mime	application/x-edi-pack-lzss
> !:ext	??$/?$
>> 21	ubequad		x		\b, data %#16.16llx...
> That variant is described as "EDI Pack LZSS1" by mentioned software
> deark. That can be verified by running command like:
> 	deark -l -d2 SPELMATE.H$
> 
> Unfortunately there exist a third variant. There the original file
> name field is missing. And there in my inspected examples the suffix
> LSZ was used. That variant is described as "EDI LZSSLib" by
> mentioned software deark. That can be verified by running command lik
> e:
> 	deark -l -d2 GUNSHOT.LZS
> Unfortunately i was not able to express this as regular
> expression, because then sample HERBTEXT.LZS is misidentified. So i
> put displaying part in sub routine edi-lzs. This looks like:
> 0	name		edi-lzs
>> 8	string		x		EDI LZSSLib packed
> !:mime	application/x-edi-pack-lzss
> !:ext	lzs
>> 8	ubequad		x		\b, data %#16.16llx...
> 
> Instead of regular expression is use a bunch of test lines. That
> look like:
> 0	string					EDILZSS
>> 7	string					1
>>> 8	search/9/b				.
>>>> &0		ubyte				<0x20
>>>>> 0			use				edi-lzs
>>>> &0		ubyte				>0x1F
>>>>> &0			ubyte			=0
>>>>>> 0				use			edi-pack
>>>>> &0			ubyte			>0x1F
>>>>>> &0				ubyte		=0
>>>>>>> 0					use	edi-pack
>>>>>> &0				ubyte		>0x1F
>>>>>>> &0				ubyte	=0
>>>>>>>> 0					use	edi-pack
>>>>>>> &0				ubyte	!0
>>>>>>>> 0					use	edi-lzs
>>>>>> &0				default		x
>>>>>>> 0					use	edi-lzs
>>>>> &0			default			x
>>>>>> 0	use						edi-lzs
>>> 8	default					x
>>>> 0		use					edi-lzs
> So i look for point character before original file name extension
> in possible 13 byte name field. If i found no point it must be be
> LSZ variant. If i found point character i inspect character of
> possible suffix part. If this is nil then is the file name
> terminator and it is pack variant. If that value is "low" than it
> is "no valid" file name. This must be LZS variant. If that value is
> "high" i must inspect next character by same procedure. This must
> be repeated until the maximal length of file name suffix (that is
> 3) is reached.
> 
> After applying the above mentioned modifications by patch
> file-5.43-archive-edi.diff and using Magdir/msdos,images,riff then
> all such inspected EDI LZSS compressed files are now described. This
> now looks like:
> 
> 4WAY.WA$:     EDI install LZSS2 packed
> 	      "4way.wav",
> 	      60430 bytes,
> 	      data 0xff5249464606ec00...
> 4WAY.WAW:     RIFF (little-endian) data, WAVE audio,
> 	      Microsoft PCM, 8 bit, mono 11025 Hz
> BOOK01A.IC$:  EDI LZSS packed
> 	      "book01a.ico",
> 	      data 0xf7000001eff02020...
> BOOK01A.ICO:  MS Windows icon resource - 1 icon, 32x32, 16 colors
> CTL3D.DL$:    EDI LZSS packed
> 	      "ctl3d.dll",
> 	      data 0xff4d5aa900020000...
> CTL3D.DLL:    MS-DOS executable, NE for MS Windows 3.x (DLL or font)
> GUNSHOT.LZS:  EDI LZSSLib packed
> 	      data 0xbf424df6e10100f3...
> GUNSHOT.bmp:  PC bitmap, Windows 3.x format, 335 x 364 x 8,
> 	      image size 122304, resolution 3543 x 3543 px/m,
> 	      cbSize 123382, bits offset 1078
> HERBTEXT.LZS: EDI LZSSLib packed
> 	      data 0xff416c6f652e6c7a...
> HERBTEXT.txt: ASCII text, with very long lines (369)
> LACERATE.LZS: EDI LZSSLib packed
> 	      data 0xbf424d2e6e0100f3...
> LACERATE.bmp: PC bitmap, Windows 3.x format, 261 x 351 x 8,
> 	      image size 92664, resolution 2756 x 2756 px/m,
> 	      cbSize 93742, bits offset 1078
> PLANTAIN.LZS: EDI LZSSLib packed
> 	      data 0xbf424d962e0100f3...
> SKYMAP.EXE:   MS-DOS executable, NE for MS Windows 3.x (EXE)
> SKYMAP.EX_:   EDI install LZSS2 packed
> 	      "skymap.exe", 576032 bytes,
> 	      data 0xff4d5aa601010000...
> SPELMATE.H:   ASCII text, with CRLF line terminators
> SPELMATE.H$:  EDI LZSS packed
> 	      "spelmate.h",
> 	      data 0xff2f2a207370656c...
> 
> I hope my diff file can be applied in future version of file utility.
> 
> With best wishes
> Jörg Jenderek
> - --
> Jörg Jenderek
> 
> 
> 
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
> 
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCY3btxQAKCRCv8rHJQhrU
> 1lo/AJoC6tcfma1nfbLIo0HRzLgDqUk5qACfZ9ElsRcq2lu4mTRvcFdGrj6MTOQ=
> =oBQp
> -----END PGP SIGNATURE-----
> <file-5_43-archive-edi_diff.DEFANGED-273><file-5_43-archive-edi_diff_sig.DEFANGED-274><trid-v-edi.txt.gz>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20221118/eff9b878/attachment.asc>


More information about the File mailing list