[File] [PATCH] Magdir/archive Zoo archive +details+ extensions

Jörg Jenderek joerg.jen.der.ek at gmx.net
Tue Apr 4 00:33:56 UTC 2023


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

Some days ago i searched for some documentation. What i searched was
packed inside an archive with file suffix ZOO.

When running file command version 5.44 on such samples i get at first
glance not bad looking output like:

GRASPDOC.ZOO:                   Zoo archive data, v2.00,
				modify: v2.0+
LHARC_1_30.ZOO:                 Zoo archive data, v2.00,
				modify: v2.0+, extract: v1.0+
M2POSX02.ZOO:                   Zoo archive data, v2.10,
				modify: v2.0+, extract: v2.1+
MyZoo2-hP.bak:                  Zoo archive data, v2.10,
				modify: v2.0+, extract: v2.1+
MyZoo3.zoo:                     Zoo archive data, v2.10,
				modify: v2.0+, extract: v1.0+
UUCODE.ZOO:                     Zoo archive data, v1.50,
				modify: v1.4+
WHRCGA.ZOO:                     Zoo archive data, v2.00,
				modify: v2.0+
playback.zoo:                   Zoo archive data, v2.10,
				modify: v2.0+, extract: v2.1+
unzip51b.zoo:                   Zoo archive data, v2.10,
				modify: v2.0+, extract: v2.1+
x-fmt-269-signature-id-621.zoo: Zoo archive data

With option --extension only ??? is displayed.

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). When running TrID
command on such ZOO examples these are described as "ZOO compressed
archive" by ark-zoo.trid.xml or with lower priority as "ZOO
compressed archive (strict)" by ark-zoo-strict.trid.xml. Here only
ZOO suffix is considered as "good" (See appended
output/trid-v-zoo.txt.gz).

For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/). This
does recognize only some archives. These are described as "ZOO
Compressed Archive" by PUID x-fmt/269. Here no mime type is shown and
only ZOO suffix is considered as "good" whereas BAK is marked with
EXTENSION_MISMATCH true (See appended output/droid-zoo.csv.gz).

With the help of these tools i found pages about ZOO file format.
There also samples to download and unpacking software like deark
are listed. That is expressed inside Magdir/archive by comment lines
like:
# URL:		https://en.wikipedia.org/wiki/Zoo_(file_format)
#		http://fileformats.archiveteam.org/wiki/Zoo
# Reference:	http://mark0.net/download/triddefs_xml.7z
#		defs/a/ark-zoo-strict.trid.xml
#		http://distcache.freebsd.org/ports-distfiles/
#		zoo-2.10pl1.tar.gz/zoo.h

The detection happens by starting lines inside Magdir/archive
which looks like:
 20	lelong		0xfdc4a7dc	Zoo archive data
 !:mime	application/x-zoo
 >4	byte		>48		\b, v%c.
 >>6	byte		>47		\b%c
 >>>7	byte		>47		\b%c
 >32	byte		>0		\b, modify: v%d
 >>33	byte		x		\b.%d+

The sample x-fmt-269-signature-id-621.zoo is not a real ZOO archive.
It contains just the first 26 bytes of an archive. This is just used
by DROID tool to recognise ZOO archives. The DROID tool also looks
for byte sequence FC83 at the end, which is wrong. Why this is happen
can be explained by looking in man page zoo(1). There is written some
sentence like:
Packing removes any garbage data appended to an archive because of
Xmodem file transfer.
All tools use in first step the same recognition method by looking
for 4 byte magic at offset 20. So i skip DROID sample by looking also
for valid major version to manipulate archive.
In the man page is also written that archive_name.bak is backup of
archive. That is created by program itself, if packing occurs, the
original unpacked archive is always left behind with an extension of
.bak. This is the default behaviour if not the E modifier is used tha
t
causes zoo not to save a backup copy of the original archive after
packing.
ZOO files typically start with 20 byte header starting with "ZOO ?.??
Archive.", followed by the bytes 0x1a 0x0 0x0, where question marks
means versions digits (like  1.50 2.00 2.10). That are just used for
informational reason and the text can be anything. That is described
inside TrID ark-zoo.trid.xml. So i also show information about
archive with unusual starting header. But i myself found no such
samples. So the start now looks like:
 20	lelong		0xfdc4a7dc
 >32	byte		>0		Zoo archive data
 !:mime	application/x-zoo
 !:ext	zoo/bak
 >>4	byte		>48		\b, v%c.
 >>>6	byte		>47		\b%c
 >>>>7	byte		>47		\b%c
 >>8	string		!\040Archive.\032 \b, at 8
 >>>8	string		x		text "%0.10s"
 >>32	byte		>0		\b, modify: v%d
 >>>33	byte		x		\b.%d+

With zoo 2.00 additional fields (archive-level versioning byte vdata
and optional archive comment with Carriage Return and Line Feed) have
been added in the archive header. So show this information  by lines
like:
 >>32	byte		>1
 >>>34		ubyte	!1		\b, header type %u
 >>>35		lelong	>0		\b, at %d
 >>>>39			uleshort x	%u bytes comment
 #>>>>(35.l)		ubequad	x	COMMENT=%16.16llx
 >>>>(35.l) 		ubyte	<040
 >>>>>(35.l+1) 		ubyte	<040
 >>>>>>(35.l+2)		string	x	%s
 >>>>>>>&0		ubyte	<040
 >>>>>>>>&0		ubyte	<040
 >>>>>>>>>&0		string	>037	%s
 >>>41		ubyte	!1		\b, vdata %#x

When the size of the header varies then of course the directory
entries ( starting with same 4 byte magic) begins at different
offset. So the following lines are just true in only some cases and
must be fine tuned:
 >42	lelong		0xfdc4a7dc	\b,
 >>70	byte		>0		extract: v%d
 >>>71	byte		x		\b.%d+

The pointer to first directory entry header is stored at offset 20 as
4 byte integer variable zoo_start. Afterwards for consistency
checking zoo_start -1 is stored as zoo_minus. With the help of the
stored offset jump to that position and inspect directory entry
structure. This start again with 4 byte magic. So here you can also
check again if test lines are not sufficient. Afterwards the
directory type is stored. In my examples i get only value 2.
Afterwards the packing method is stored. Here 0 means no packing, 1
means Lempel-Ziv-Welch (LZW) named lzd (see also deark output) and 2
means LZ77+Huffman (LZH). So this information is shown by lines like:
 >>24	lelong		x		\b; at %u
 #>>28	lelong		x		\b, zoo_minus %#x
 #>>(24.l+0) ulelong	!0xfdc4a7dc	\b, zoo_tag=%8.8x
 >>(24.l+4)	ubyte	!2		type=%u
 >>(24.l+5)	ubyte		x	method=
 >>>(24.l+5)	ubyte		0	\bnot-compressed
 >>>(24.l+5)	ubyte		1	\blzd
 >>>(24.l+5)	ubyte		2	\blzh
This information can also be verified by running command line tool
deark with line like:
	deark -m zoo -l -d2 WHRCGA.ZOO

So the old part with minimum version needed to extract after modify
is replaced and this now looks like:
 >>(24.l+28)	ubyte	x		\b, extract: v%u
 >>(24.l+29)	ubyte	x		\b.%u+

Interesting is maybe the original and compressed size (org_size
size_now) of first archived file. So show that information by lines
like:
 >>(24.l+20)	ulelong		x	\b, size %u
 >>(24.l+24)	ulelong		x	(%u compressed)
Correlated with that is the short/DOS file name like 12345678.012. So
show this also by line like:
 >>(24.l+38)	string	x		\b, %0.13s

For directory entry type 2 with variable part the entry may contain a
variable part with length var_dir_len. There the entry may contain a
long (namlen maximal 256) file name like "README.Debian" beside DOS
name README.Deb in example MyZoo3.zoo. Also an directory path (dirlen
maximal 256) like "usr/share/doc/zoo" is possible stored. So show
that information by lines like:
 >>(24.l+4)	ubyte	=2
 >>>(24.l+51)		uleshort >0
 #>>>(24.l+51)		uleshort >0	\b, variable part length %u
 #>>>>(24.l+56)		ubyte	x	\b, namlen %u
 #>>>>(24.l+57)		ubyte	x	\b, dirlen %u
 >>>>(24.l+56)		ubyte	>0
 >>>>>(24.l+58)		string	x	"%s"
 >>>>(24.l+57)		ubyte	>0
 >>>>>(24.l+55)		ubyte	x
 >>>>>>&(&0.b+2)	string	x	in "%s"

Correlated with that file is the modification date and time in DOS
format. So show that information by lines like
 >>>(24.l+14)		lemsdosdate x	\b, modified %s
 >>>(24.l+16)		lemsdostime x	%s

After applying the above mentioned modifications by patch
file-5.44-archive-zoo.diff then my ZOO samples are in principal
described before, but now extract version is always shown and
additional information like name, time-stamps of first stored file is
also shown. Also some misidentification now vanish. So this now looks
like:

GRASPDOC.ZOO:                   Zoo archive data, v2.00,
				modify: v2.0+, extract: v2.1+,
				at 30599 258 bytes comment
				Anonymous ftp site garbo.uwasa.fi
				128.214.87.1 moderated by Timo Salmi
				ts at chyde.uwasa.fi      PC directories
				and uploads\015\012Harri Valkama
				hv at chyde.uwasa.fi   PC, Mac, Unix
				files, and upload
				; at 102 method=lzh
				, next entry at 29866, CRC 0xd6b9
				, size 111652 (29693 compressed)
				, grasp.man,
				modified Sun, May 19 1986 14:57:00
LHARC_1_30.ZOO:                 Zoo archive data, v2.00,
				modify: v2.0+, extract: v1.0+
				; at 42 method=not-compressed
				, next entry at 170, CRC 0x4d19
				, size 45 (45 compressed)
				, X.inf ".info"
				in "LHArc"
				, time zone 20/4,
				modified Sun, Dec 06 1990 10:45:58
M2POSX02.ZOO:                   Zoo archive data, v2.10,
				modify: v2.0+, extract: v2.1+
				, vdata 0x3
				; at 42 method=lzh
				, next entry at 844, CRC 0x7ccf
				, size 1264 (660 compressed)
				, m2ppx.g
				in "m2posix.02/bin"
				, time zone -4/4,
				modified Sun, Feb 21 1993 05:19:14
MyZoo2-hP.bak:                  Zoo archive data, v2.10,
				modify: v2.0+, extract: v2.1+
				, vdata 0x3;
				at 42 method=lzh
				, next entry at 504, CRC 0x3cae
				, size 514 (302 compressed)
				, README.Deb "README.Debian"
				in "usr/share/doc/zoo"
				, time zone 0/4,
				modified Sun, Feb 15 2012 18:05:54
MyZoo3.zoo:                     Zoo archive data, v2.10,
				modify: v2.0+, extract: v1.0+
				, vdata 0x3
				; at 42 method=lzd
				, next entry at 594, CRC 0x3cae
				, size 514 (392 compressed), deleted
				, README.Deb "README.Debian"
				in "usr/share/doc/zoo"
				, time zone 0/4,
				modified Sun, Feb 15 2012 18:05:54
UUCODE.ZOO:                     Zoo archive data, v1.50,
				modify: v1.4+, extract: v1.0+
				; at 34 method=lzd
				, next entry at 10351, CRC 0x91a5
				, size 15272 (10256 compressed)
				, uudecode,
				modified Sun, Sep 20 1989 21:57:12
WHRCGA.ZOO:                     Zoo archive data, v2.00,
				modify: v2.0+, extract: v2.1+
				, at 61369 258 bytes comment
				Anonymous ftp site garbo.uwasa.fi
				128.214.87.1 moderated by Timo Salmi
				ts at chyde.uwasa.fi      PC directories
				and uploads\015\012Harri Valkama
				hv at chyde.uwasa.fi   PC, Mac, Unix
				files, and upload
				; at 68 method=lzh
				, next entry at 44551, CRC 0x6f35
				, at 0xadd9 46 bytes comment
				"CGA .GL file showing menu
				input from keyboard"
				, size 160073 (44366 compressed)
				, whrcga.gl,
				modified Sun, Oct 07 1989 13:49:26
playback.zoo:                   Zoo archive data, v2.10,
				modify: v2.0+, extract: v2.1+
				, vdata 0x3
				; at 42 method=lzh
				, next entry at 33948, CRC 0x807c
				, size 68840 (33835 compressed)
				, playback.prg
				, time zone -107/4,
				modified Sun, Mar 25 1992 12:08:00
unzip51b.zoo:                   Zoo archive data, v2.10,
				modify: v2.0+, extract: v2.1+
				, vdata 0x3
				; at 42 method=lzh
				, next entry at 13606, CRC 0x1e50
				, size 22054 (13485 compressed)
				, funzip.ttp
				in "./68000"
				, time zone 4/4,
				modified Sun, Feb 09 1994 15:00:34
x-fmt-269-signature-id-621.zoo: data

Maybe that some exotic variants are missed ( tiny; one-file small
archive). I hope my diff file can be applied in future version of
file utility.

I use two test functions lemsdosdate and lemsdostime to interpret 2
byte value as bit encoded date and time in DOS format relative to
year 1980, but these functions are not mentioned in the official
documentation magic.man. So i think these 2 functions should be
mentioned there.

With best wishes,
Jörg Jenderek
- --
Jörg Jenderek
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCZCtwdAAKCRCv8rHJQhrU
1owuAJ45XDMYM1yPl1qJTRoZJmgkTRoC1wCeKWzw+wY7ihPj82N65CRsSr6Gfq4=
=SwXB
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-zoo.txt.gz
Type: application/x-gzip
Size: 561 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230404/d500e22c/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: droid-zoo.csv.gz
Type: application/x-gzip
Size: 573 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230404/d500e22c/attachment-0001.bin>
-------------- next part --------------
--- file-5.44/magic/Magdir/archive.old	2022-12-26 19:00:47.000000000 +0100
+++ file-5.44/magic/Magdir/archive	2023-04-04 02:27:58.904652000 +0200
@@ -1758,14 +1758,114 @@
 
 # Zoo archiver
-20	lelong		0xfdc4a7dc	Zoo archive data
+# Update: Joerg Jenderek
+# URL:		https://en.wikipedia.org/wiki/Zoo_(file_format)
+#		http://fileformats.archiveteam.org/wiki/Zoo
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/a/ark-zoo-strict.trid.xml
+#		http://distcache.freebsd.org/ports-distfiles/zoo-2.10pl1.tar.gz/zoo.h 
+# Note:		called "ZOO compressed archive (strict)" by TrID and "ZOO Compressed Archive" by DROID via PUID x-fmt/269 
+#		verified by command like `deark -m zoo -l -d2 WHRCGA.ZOO`
+20	lelong		0xfdc4a7dc
+# skip DROID x-fmt-269-signature-id-621.zoo by looking for valid major version to manipulate archive
+>32	byte		>0		Zoo archive data
 !:mime	application/x-zoo
->4	byte		>48		\b, v%c.
->>6	byte		>47		\b%c
->>>7	byte		>47		\b%c
->32	byte		>0		\b, modify: v%d
->>33	byte		x		\b.%d+
->42	lelong		0xfdc4a7dc	\b,
->>70	byte		>0		extract: v%d
->>>71	byte		x		\b.%d+
+# bak is extension of backup-ed zoo
+!:ext	zoo/bak
+# version in text form like: 1.50 2.00 2.10
+>>4	byte		>48		\b, v%c.
+>>>6	byte		>47		\b%c
+>>>>7	byte		>47		\b%c
+# ZOO files typically start with "ZOO ?.?? Archive.", followed by the bytes 0x1a 0x0 0x0; not used by Zoo and they may be anything
+>>8	string		!\040Archive.\032 \b, at 8
+>>>8	string		x		text "%0.10s"
+# major_ver.minor_ver; minimum version needed to manipulate archive like: 1.0 2.0
+>>32	byte		>0		\b, modify: v%d
+>>>33	byte		x		\b.%d+
+# major_ver.minor_ver; minimum version needed to extract after modify like in old versions
+>>(24.l+28)	ubyte	x		\b, extract: v%u
+>>(24.l+29)	ubyte	x		\b.%u+
+# with zoo 2.00 additional fields have been added in the archive header
+>>32	byte		>1
+# type; type of archive header like: 1 2
+>>>34		ubyte	!1		\b, header type %u
+# acmt_pos; position of archive comment like: 6258 30599 61369 149501
+>>>35		lelong	>0		\b, at %d
+# acmt_len; length of archive comment like: 258
+>>>>39			uleshort x	%u bytes comment
+#>>>>(35.l)		ubequad	x	COMMENT=%16.16llx
+# 1st character of comment maybe is CarriageReturn (0x0d)
+>>>>(35.l) 		ubyte	<040
+# 2nd character of comment maybe is LineFeed (0x0a)
+>>>>>(35.l+1) 		ubyte	<040
+# comment string after CRLF like "Anonymous ftp site garbo.uwasa.fi 128.214.87.1 moderated by"
+>>>>>>(35.l+2)		string	x	%s
+# next character of remaining comment maybe is CarriageReturn (0x0d)
+>>>>>>>&0		ubyte	<040
+>>>>>>>>&0		ubyte	<040
+# 2nd comment part like: Timo Salmi ts at chyde.uwasa.fi      PC directories and uploads\015\012Harri Valkama hv at chyde.uwasa.fi   PC, Mac, Unix files, and upload
+>>>>>>>>>&0		string	>037	%s
+# vdata; archive-level versioning byte like: 1 3
+>>>41		ubyte	!1		\b, vdata %#x
+# zoo_start; pointer to 1st entry header 
+>>24	lelong		x		\b; at %u
+# zoo_minus; zoo_start -1 for consistency checking
+#>>28	lelong		x		\b, zoo_minus %#x
+# zoo_tag; tag for check
+#>>(24.l+0) ulelong	!0xfdc4a7dc	\b, zoo_tag=%8.8x
+# type; type of directory entry like: 1 2
+>>(24.l+4)	ubyte	!2		type=%u
+# packing_method; 0~no packing 1~normal LZW 2~lzh
+>>(24.l+5)	ubyte		x	method=
+>>>(24.l+5)	ubyte		0	\bnot-compressed
+>>>(24.l+5)	ubyte		1	\blzd
+>>>(24.l+5)	ubyte		2	\blzh
+# next; position of next directory entry
+>>(24.l+6)	ulelong		x	\b, next entry at %u
+# offset; position of file data for this entry
+#>>(24.l+10) ulelong		x	\b, data at %u
+# file_crc; CRC-16 of file data
+>>(24.l+18)	uleshort	x	\b, CRC %#4.4x
+# comment; zero if none or points to entry comment like ADD9h (WHRCGA.ZOO)
+>>(24.l+32)	lelong		>0	\b, at %#x
+# cmt_size; if not 0 for none then length of entry comment like: 46
+>>>(24.l+36)	uleshort	>0	%u bytes comment
+# entry comment itself like: "CGA .GL file showing menu input from keyboard"
+>>>>(&-6.l)	string		x	"%s"
+# org_size; original size of file
+>>(24.l+20)	ulelong		x	\b, size %u
+# size_now; compressed size of file
+>>(24.l+24)	ulelong		x	(%u compressed)
+# major_ver.minor_ver; minimum version needed to extract already done
+# deleted; will be 1 if deleted, 0 if not
+>>(24.l+30)	ubyte		=1	\b, deleted
+# struc; file structure if any; WHAT IS THAT?
+>>(24.l+31)	ubyte		!0	\b, structured
+# fname[13]; short/DOS file name like 12345678.012
+>>(24.l+38)	string	x		\b, %0.13s
+# for directory entry type 2 with variable part
+>>(24.l+4)	ubyte	=2
+# var_dir_len; length of variable part of dir entry
+>>>(24.l+51)		uleshort >0
+#>>>(24.l+51)		uleshort >0	\b, variable part length %u
+# namlen; length of long filename
+#>>>>(24.l+56)		ubyte	x	\b, namlen %u
+# dirlen; length of directory name
+#>>>>(24.l+57)		ubyte	x	\b, dirlen %u
+# if file length positive then show long file name
+>>>>(24.l+56)		ubyte	>0
+# lfname[256]; long file name \0-terminated
+>>>>>(24.l+58)		string	x	"%s"
+# if directory length positive then jump before file name field and then jump this addtional length plus 2 (\0-terminator + dirlen field) to following directory name
+>>>>(24.l+57)		ubyte	>0
+>>>>>(24.l+55)		ubyte	x
+# dirname[256]; directory name \0-terminated
+>>>>>>&(&0.b+2)		string	x	in "%s"
+# dir_crc; CRC of directory entry
+#>>>(24.l+54)		uleshort x	\b, entry CRC %#4.4x
+# tz; timezone where file was archived; 7Fh~unknown 4~1.00hoursWestOfUTC 12 16 20~5.00hoursWestOfUTC -107~26.75hoursEastOfUTC -4~1.00hoursEastOfUTC
+>>>(24.l+53)		byte	!0x7f	\b, time zone %d/4
+# date; last mod file date in DOS format
+>>>(24.l+14)		lemsdosdate x	\b, modified %s
+# time; last mod file time in DOS format
+>>>(24.l+16)		lemsdostime x	%s
 
 # Shell archives
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.44-archive-zoo.diff.sig
Type: application/octet-stream
Size: 2554 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230404/d500e22c/attachment.obj>


More information about the File mailing list