[File] [PATCH] Magdir/archive Zoo archive +details+ extensions

Christos Zoulas christos at zoulas.com
Mon Apr 17 16:41:06 UTC 2023


Committed, thanks!

christos

> On Apr 3, 2023, at 8:33 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hello,
> 
> Some days ago i searched for some documentation. What i searched was
> packed inside an archive with file suffix ZOO.
> 
> When running file command version 5.44 on such samples i get at first
> glance not bad looking output like:
> 
> GRASPDOC.ZOO:                   Zoo archive data, v2.00,
> 				modify: v2.0+
> LHARC_1_30.ZOO:                 Zoo archive data, v2.00,
> 				modify: v2.0+, extract: v1.0+
> M2POSX02.ZOO:                   Zoo archive data, v2.10,
> 				modify: v2.0+, extract: v2.1+
> MyZoo2-hP.bak:                  Zoo archive data, v2.10,
> 				modify: v2.0+, extract: v2.1+
> MyZoo3.zoo:                     Zoo archive data, v2.10,
> 				modify: v2.0+, extract: v1.0+
> UUCODE.ZOO:                     Zoo archive data, v1.50,
> 				modify: v1.4+
> WHRCGA.ZOO:                     Zoo archive data, v2.00,
> 				modify: v2.0+
> playback.zoo:                   Zoo archive data, v2.10,
> 				modify: v2.0+, extract: v2.1+
> unzip51b.zoo:                   Zoo archive data, v2.10,
> 				modify: v2.0+, extract: v2.1+
> x-fmt-269-signature-id-621.zoo: Zoo archive data
> 
> With option --extension only ??? is displayed.
> 
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). When running TrID
> command on such ZOO examples these are described as "ZOO compressed
> archive" by ark-zoo.trid.xml or with lower priority as "ZOO
> compressed archive (strict)" by ark-zoo-strict.trid.xml. Here only
> ZOO suffix is considered as "good" (See appended
> output/trid-v-zoo.txt.gz).
> 
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/). This
> does recognize only some archives. These are described as "ZOO
> Compressed Archive" by PUID x-fmt/269. Here no mime type is shown and
> only ZOO suffix is considered as "good" whereas BAK is marked with
> EXTENSION_MISMATCH true (See appended output/droid-zoo.csv.gz).
> 
> With the help of these tools i found pages about ZOO file format.
> There also samples to download and unpacking software like deark
> are listed. That is expressed inside Magdir/archive by comment lines
> like:
> # URL:		https://en.wikipedia.org/wiki/Zoo_(file_format)
> #		http://fileformats.archiveteam.org/wiki/Zoo
> # Reference:	http://mark0.net/download/triddefs_xml.7z
> #		defs/a/ark-zoo-strict.trid.xml
> #		http://distcache.freebsd.org/ports-distfiles/
> #		zoo-2.10pl1.tar.gz/zoo.h
> 
> The detection happens by starting lines inside Magdir/archive
> which looks like:
> 20	lelong		0xfdc4a7dc	Zoo archive data
> !:mime	application/x-zoo
>> 4	byte		>48		\b, v%c.
>>> 6	byte		>47		\b%c
>>>> 7	byte		>47		\b%c
>> 32	byte		>0		\b, modify: v%d
>>> 33	byte		x		\b.%d+
> 
> The sample x-fmt-269-signature-id-621.zoo is not a real ZOO archive.
> It contains just the first 26 bytes of an archive. This is just used
> by DROID tool to recognise ZOO archives. The DROID tool also looks
> for byte sequence FC83 at the end, which is wrong. Why this is happen
> can be explained by looking in man page zoo(1). There is written some
> sentence like:
> Packing removes any garbage data appended to an archive because of
> Xmodem file transfer.
> All tools use in first step the same recognition method by looking
> for 4 byte magic at offset 20. So i skip DROID sample by looking also
> for valid major version to manipulate archive.
> In the man page is also written that archive_name.bak is backup of
> archive. That is created by program itself, if packing occurs, the
> original unpacked archive is always left behind with an extension of
> .bak. This is the default behaviour if not the E modifier is used tha
> t
> causes zoo not to save a backup copy of the original archive after
> packing.
> ZOO files typically start with 20 byte header starting with "ZOO ?.??
> Archive.", followed by the bytes 0x1a 0x0 0x0, where question marks
> means versions digits (like  1.50 2.00 2.10). That are just used for
> informational reason and the text can be anything. That is described
> inside TrID ark-zoo.trid.xml. So i also show information about
> archive with unusual starting header. But i myself found no such
> samples. So the start now looks like:
> 20	lelong		0xfdc4a7dc
>> 32	byte		>0		Zoo archive data
> !:mime	application/x-zoo
> !:ext	zoo/bak
>>> 4	byte		>48		\b, v%c.
>>>> 6	byte		>47		\b%c
>>>>> 7	byte		>47		\b%c
>>> 8	string		!\040Archive.\032 \b, at 8
>>>> 8	string		x		text "%0.10s"
>>> 32	byte		>0		\b, modify: v%d
>>>> 33	byte		x		\b.%d+
> 
> With zoo 2.00 additional fields (archive-level versioning byte vdata
> and optional archive comment with Carriage Return and Line Feed) have
> been added in the archive header. So show this information  by lines
> like:
>>> 32	byte		>1
>>>> 34		ubyte	!1		\b, header type %u
>>>> 35		lelong	>0		\b, at %d
>>>>> 39			uleshort x	%u bytes comment
> #>>>>(35.l)		ubequad	x	COMMENT=%16.16llx
>>>>> (35.l) 		ubyte	<040
>>>>>> (35.l+1) 		ubyte	<040
>>>>>>> (35.l+2)		string	x	%s
>>>>>>>> &0		ubyte	<040
>>>>>>>>> &0		ubyte	<040
>>>>>>>>>> &0		string	>037	%s
>>>> 41		ubyte	!1		\b, vdata %#x
> 
> When the size of the header varies then of course the directory
> entries ( starting with same 4 byte magic) begins at different
> offset. So the following lines are just true in only some cases and
> must be fine tuned:
>> 42	lelong		0xfdc4a7dc	\b,
>>> 70	byte		>0		extract: v%d
>>>> 71	byte		x		\b.%d+
> 
> The pointer to first directory entry header is stored at offset 20 as
> 4 byte integer variable zoo_start. Afterwards for consistency
> checking zoo_start -1 is stored as zoo_minus. With the help of the
> stored offset jump to that position and inspect directory entry
> structure. This start again with 4 byte magic. So here you can also
> check again if test lines are not sufficient. Afterwards the
> directory type is stored. In my examples i get only value 2.
> Afterwards the packing method is stored. Here 0 means no packing, 1
> means Lempel-Ziv-Welch (LZW) named lzd (see also deark output) and 2
> means LZ77+Huffman (LZH). So this information is shown by lines like:
>>> 24	lelong		x		\b; at %u
> #>>28	lelong		x		\b, zoo_minus %#x
> #>>(24.l+0) ulelong	!0xfdc4a7dc	\b, zoo_tag=%8.8x
>>> (24.l+4)	ubyte	!2		type=%u
>>> (24.l+5)	ubyte		x	method=
>>>> (24.l+5)	ubyte		0	\bnot-compressed
>>>> (24.l+5)	ubyte		1	\blzd
>>>> (24.l+5)	ubyte		2	\blzh
> This information can also be verified by running command line tool
> deark with line like:
> 	deark -m zoo -l -d2 WHRCGA.ZOO
> 
> So the old part with minimum version needed to extract after modify
> is replaced and this now looks like:
>>> (24.l+28)	ubyte	x		\b, extract: v%u
>>> (24.l+29)	ubyte	x		\b.%u+
> 
> Interesting is maybe the original and compressed size (org_size
> size_now) of first archived file. So show that information by lines
> like:
>>> (24.l+20)	ulelong		x	\b, size %u
>>> (24.l+24)	ulelong		x	(%u compressed)
> Correlated with that is the short/DOS file name like 12345678.012. So
> show this also by line like:
>>> (24.l+38)	string	x		\b, %0.13s
> 
> For directory entry type 2 with variable part the entry may contain a
> variable part with length var_dir_len. There the entry may contain a
> long (namlen maximal 256) file name like "README.Debian" beside DOS
> name README.Deb in example MyZoo3.zoo. Also an directory path (dirlen
> maximal 256) like "usr/share/doc/zoo" is possible stored. So show
> that information by lines like:
>>> (24.l+4)	ubyte	=2
>>>> (24.l+51)		uleshort >0
> #>>>(24.l+51)		uleshort >0	\b, variable part length %u
> #>>>>(24.l+56)		ubyte	x	\b, namlen %u
> #>>>>(24.l+57)		ubyte	x	\b, dirlen %u
>>>>> (24.l+56)		ubyte	>0
>>>>>> (24.l+58)		string	x	"%s"
>>>>> (24.l+57)		ubyte	>0
>>>>>> (24.l+55)		ubyte	x
>>>>>>> &(&0.b+2)	string	x	in "%s"
> 
> Correlated with that file is the modification date and time in DOS
> format. So show that information by lines like
>>>> (24.l+14)		lemsdosdate x	\b, modified %s
>>>> (24.l+16)		lemsdostime x	%s
> 
> After applying the above mentioned modifications by patch
> file-5.44-archive-zoo.diff then my ZOO samples are in principal
> described before, but now extract version is always shown and
> additional information like name, time-stamps of first stored file is
> also shown. Also some misidentification now vanish. So this now looks
> like:
> 
> GRASPDOC.ZOO:                   Zoo archive data, v2.00,
> 				modify: v2.0+, extract: v2.1+,
> 				at 30599 258 bytes comment
> 				Anonymous ftp site garbo.uwasa.fi
> 				128.214.87.1 moderated by Timo Salmi
> 				ts at chyde.uwasa.fi      PC directories
> 				and uploads\015\012Harri Valkama
> 				hv at chyde.uwasa.fi   PC, Mac, Unix
> 				files, and upload
> 				; at 102 method=lzh
> 				, next entry at 29866, CRC 0xd6b9
> 				, size 111652 (29693 compressed)
> 				, grasp.man,
> 				modified Sun, May 19 1986 14:57:00
> LHARC_1_30.ZOO:                 Zoo archive data, v2.00,
> 				modify: v2.0+, extract: v1.0+
> 				; at 42 method=not-compressed
> 				, next entry at 170, CRC 0x4d19
> 				, size 45 (45 compressed)
> 				, X.inf ".info"
> 				in "LHArc"
> 				, time zone 20/4,
> 				modified Sun, Dec 06 1990 10:45:58
> M2POSX02.ZOO:                   Zoo archive data, v2.10,
> 				modify: v2.0+, extract: v2.1+
> 				, vdata 0x3
> 				; at 42 method=lzh
> 				, next entry at 844, CRC 0x7ccf
> 				, size 1264 (660 compressed)
> 				, m2ppx.g
> 				in "m2posix.02/bin"
> 				, time zone -4/4,
> 				modified Sun, Feb 21 1993 05:19:14
> MyZoo2-hP.bak:                  Zoo archive data, v2.10,
> 				modify: v2.0+, extract: v2.1+
> 				, vdata 0x3;
> 				at 42 method=lzh
> 				, next entry at 504, CRC 0x3cae
> 				, size 514 (302 compressed)
> 				, README.Deb "README.Debian"
> 				in "usr/share/doc/zoo"
> 				, time zone 0/4,
> 				modified Sun, Feb 15 2012 18:05:54
> MyZoo3.zoo:                     Zoo archive data, v2.10,
> 				modify: v2.0+, extract: v1.0+
> 				, vdata 0x3
> 				; at 42 method=lzd
> 				, next entry at 594, CRC 0x3cae
> 				, size 514 (392 compressed), deleted
> 				, README.Deb "README.Debian"
> 				in "usr/share/doc/zoo"
> 				, time zone 0/4,
> 				modified Sun, Feb 15 2012 18:05:54
> UUCODE.ZOO:                     Zoo archive data, v1.50,
> 				modify: v1.4+, extract: v1.0+
> 				; at 34 method=lzd
> 				, next entry at 10351, CRC 0x91a5
> 				, size 15272 (10256 compressed)
> 				, uudecode,
> 				modified Sun, Sep 20 1989 21:57:12
> WHRCGA.ZOO:                     Zoo archive data, v2.00,
> 				modify: v2.0+, extract: v2.1+
> 				, at 61369 258 bytes comment
> 				Anonymous ftp site garbo.uwasa.fi
> 				128.214.87.1 moderated by Timo Salmi
> 				ts at chyde.uwasa.fi      PC directories
> 				and uploads\015\012Harri Valkama
> 				hv at chyde.uwasa.fi   PC, Mac, Unix
> 				files, and upload
> 				; at 68 method=lzh
> 				, next entry at 44551, CRC 0x6f35
> 				, at 0xadd9 46 bytes comment
> 				"CGA .GL file showing menu
> 				input from keyboard"
> 				, size 160073 (44366 compressed)
> 				, whrcga.gl,
> 				modified Sun, Oct 07 1989 13:49:26
> playback.zoo:                   Zoo archive data, v2.10,
> 				modify: v2.0+, extract: v2.1+
> 				, vdata 0x3
> 				; at 42 method=lzh
> 				, next entry at 33948, CRC 0x807c
> 				, size 68840 (33835 compressed)
> 				, playback.prg
> 				, time zone -107/4,
> 				modified Sun, Mar 25 1992 12:08:00
> unzip51b.zoo:                   Zoo archive data, v2.10,
> 				modify: v2.0+, extract: v2.1+
> 				, vdata 0x3
> 				; at 42 method=lzh
> 				, next entry at 13606, CRC 0x1e50
> 				, size 22054 (13485 compressed)
> 				, funzip.ttp
> 				in "./68000"
> 				, time zone 4/4,
> 				modified Sun, Feb 09 1994 15:00:34
> x-fmt-269-signature-id-621.zoo: data
> 
> Maybe that some exotic variants are missed ( tiny; one-file small
> archive). I hope my diff file can be applied in future version of
> file utility.
> 
> I use two test functions lemsdosdate and lemsdostime to interpret 2
> byte value as bit encoded date and time in DOS format relative to
> year 1980, but these functions are not mentioned in the official
> documentation magic.man. So i think these 2 functions should be
> mentioned there.
> 
> With best wishes,
> Jörg Jenderek
> - --
> Jörg Jenderek
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
> 
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCZCtwdAAKCRCv8rHJQhrU
> 1owuAJ45XDMYM1yPl1qJTRoZJmgkTRoC1wCeKWzw+wY7ihPj82N65CRsSr6Gfq4=
> =SwXB
> -----END PGP SIGNATURE-----
> <trid-v-zoo.txt.gz><droid-zoo.csv.gz><file-5_44-archive-zoo_diff.DEFANGED-359><file-5_44-archive-zoo_diff_sig.DEFANGED-360>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20230417/c39c86d1/attachment.asc>


More information about the File mailing list