[File] [PATCH] Magdir/archive Zoo archive +details+ extensions
Christos Zoulas
christos at zoulas.com
Mon Apr 17 16:41:06 UTC 2023
Committed, thanks!
christos
> On Apr 3, 2023, at 8:33 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello,
>
> Some days ago i searched for some documentation. What i searched was
> packed inside an archive with file suffix ZOO.
>
> When running file command version 5.44 on such samples i get at first
> glance not bad looking output like:
>
> GRASPDOC.ZOO: Zoo archive data, v2.00,
> modify: v2.0+
> LHARC_1_30.ZOO: Zoo archive data, v2.00,
> modify: v2.0+, extract: v1.0+
> M2POSX02.ZOO: Zoo archive data, v2.10,
> modify: v2.0+, extract: v2.1+
> MyZoo2-hP.bak: Zoo archive data, v2.10,
> modify: v2.0+, extract: v2.1+
> MyZoo3.zoo: Zoo archive data, v2.10,
> modify: v2.0+, extract: v1.0+
> UUCODE.ZOO: Zoo archive data, v1.50,
> modify: v1.4+
> WHRCGA.ZOO: Zoo archive data, v2.00,
> modify: v2.0+
> playback.zoo: Zoo archive data, v2.10,
> modify: v2.0+, extract: v2.1+
> unzip51b.zoo: Zoo archive data, v2.10,
> modify: v2.0+, extract: v2.1+
> x-fmt-269-signature-id-621.zoo: Zoo archive data
>
> With option --extension only ??? is displayed.
>
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). When running TrID
> command on such ZOO examples these are described as "ZOO compressed
> archive" by ark-zoo.trid.xml or with lower priority as "ZOO
> compressed archive (strict)" by ark-zoo-strict.trid.xml. Here only
> ZOO suffix is considered as "good" (See appended
> output/trid-v-zoo.txt.gz).
>
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/). This
> does recognize only some archives. These are described as "ZOO
> Compressed Archive" by PUID x-fmt/269. Here no mime type is shown and
> only ZOO suffix is considered as "good" whereas BAK is marked with
> EXTENSION_MISMATCH true (See appended output/droid-zoo.csv.gz).
>
> With the help of these tools i found pages about ZOO file format.
> There also samples to download and unpacking software like deark
> are listed. That is expressed inside Magdir/archive by comment lines
> like:
> # URL: https://en.wikipedia.org/wiki/Zoo_(file_format)
> # http://fileformats.archiveteam.org/wiki/Zoo
> # Reference: http://mark0.net/download/triddefs_xml.7z
> # defs/a/ark-zoo-strict.trid.xml
> # http://distcache.freebsd.org/ports-distfiles/
> # zoo-2.10pl1.tar.gz/zoo.h
>
> The detection happens by starting lines inside Magdir/archive
> which looks like:
> 20 lelong 0xfdc4a7dc Zoo archive data
> !:mime application/x-zoo
>> 4 byte >48 \b, v%c.
>>> 6 byte >47 \b%c
>>>> 7 byte >47 \b%c
>> 32 byte >0 \b, modify: v%d
>>> 33 byte x \b.%d+
>
> The sample x-fmt-269-signature-id-621.zoo is not a real ZOO archive.
> It contains just the first 26 bytes of an archive. This is just used
> by DROID tool to recognise ZOO archives. The DROID tool also looks
> for byte sequence FC83 at the end, which is wrong. Why this is happen
> can be explained by looking in man page zoo(1). There is written some
> sentence like:
> Packing removes any garbage data appended to an archive because of
> Xmodem file transfer.
> All tools use in first step the same recognition method by looking
> for 4 byte magic at offset 20. So i skip DROID sample by looking also
> for valid major version to manipulate archive.
> In the man page is also written that archive_name.bak is backup of
> archive. That is created by program itself, if packing occurs, the
> original unpacked archive is always left behind with an extension of
> .bak. This is the default behaviour if not the E modifier is used tha
> t
> causes zoo not to save a backup copy of the original archive after
> packing.
> ZOO files typically start with 20 byte header starting with "ZOO ?.??
> Archive.", followed by the bytes 0x1a 0x0 0x0, where question marks
> means versions digits (like 1.50 2.00 2.10). That are just used for
> informational reason and the text can be anything. That is described
> inside TrID ark-zoo.trid.xml. So i also show information about
> archive with unusual starting header. But i myself found no such
> samples. So the start now looks like:
> 20 lelong 0xfdc4a7dc
>> 32 byte >0 Zoo archive data
> !:mime application/x-zoo
> !:ext zoo/bak
>>> 4 byte >48 \b, v%c.
>>>> 6 byte >47 \b%c
>>>>> 7 byte >47 \b%c
>>> 8 string !\040Archive.\032 \b, at 8
>>>> 8 string x text "%0.10s"
>>> 32 byte >0 \b, modify: v%d
>>>> 33 byte x \b.%d+
>
> With zoo 2.00 additional fields (archive-level versioning byte vdata
> and optional archive comment with Carriage Return and Line Feed) have
> been added in the archive header. So show this information by lines
> like:
>>> 32 byte >1
>>>> 34 ubyte !1 \b, header type %u
>>>> 35 lelong >0 \b, at %d
>>>>> 39 uleshort x %u bytes comment
> #>>>>(35.l) ubequad x COMMENT=%16.16llx
>>>>> (35.l) ubyte <040
>>>>>> (35.l+1) ubyte <040
>>>>>>> (35.l+2) string x %s
>>>>>>>> &0 ubyte <040
>>>>>>>>> &0 ubyte <040
>>>>>>>>>> &0 string >037 %s
>>>> 41 ubyte !1 \b, vdata %#x
>
> When the size of the header varies then of course the directory
> entries ( starting with same 4 byte magic) begins at different
> offset. So the following lines are just true in only some cases and
> must be fine tuned:
>> 42 lelong 0xfdc4a7dc \b,
>>> 70 byte >0 extract: v%d
>>>> 71 byte x \b.%d+
>
> The pointer to first directory entry header is stored at offset 20 as
> 4 byte integer variable zoo_start. Afterwards for consistency
> checking zoo_start -1 is stored as zoo_minus. With the help of the
> stored offset jump to that position and inspect directory entry
> structure. This start again with 4 byte magic. So here you can also
> check again if test lines are not sufficient. Afterwards the
> directory type is stored. In my examples i get only value 2.
> Afterwards the packing method is stored. Here 0 means no packing, 1
> means Lempel-Ziv-Welch (LZW) named lzd (see also deark output) and 2
> means LZ77+Huffman (LZH). So this information is shown by lines like:
>>> 24 lelong x \b; at %u
> #>>28 lelong x \b, zoo_minus %#x
> #>>(24.l+0) ulelong !0xfdc4a7dc \b, zoo_tag=%8.8x
>>> (24.l+4) ubyte !2 type=%u
>>> (24.l+5) ubyte x method=
>>>> (24.l+5) ubyte 0 \bnot-compressed
>>>> (24.l+5) ubyte 1 \blzd
>>>> (24.l+5) ubyte 2 \blzh
> This information can also be verified by running command line tool
> deark with line like:
> deark -m zoo -l -d2 WHRCGA.ZOO
>
> So the old part with minimum version needed to extract after modify
> is replaced and this now looks like:
>>> (24.l+28) ubyte x \b, extract: v%u
>>> (24.l+29) ubyte x \b.%u+
>
> Interesting is maybe the original and compressed size (org_size
> size_now) of first archived file. So show that information by lines
> like:
>>> (24.l+20) ulelong x \b, size %u
>>> (24.l+24) ulelong x (%u compressed)
> Correlated with that is the short/DOS file name like 12345678.012. So
> show this also by line like:
>>> (24.l+38) string x \b, %0.13s
>
> For directory entry type 2 with variable part the entry may contain a
> variable part with length var_dir_len. There the entry may contain a
> long (namlen maximal 256) file name like "README.Debian" beside DOS
> name README.Deb in example MyZoo3.zoo. Also an directory path (dirlen
> maximal 256) like "usr/share/doc/zoo" is possible stored. So show
> that information by lines like:
>>> (24.l+4) ubyte =2
>>>> (24.l+51) uleshort >0
> #>>>(24.l+51) uleshort >0 \b, variable part length %u
> #>>>>(24.l+56) ubyte x \b, namlen %u
> #>>>>(24.l+57) ubyte x \b, dirlen %u
>>>>> (24.l+56) ubyte >0
>>>>>> (24.l+58) string x "%s"
>>>>> (24.l+57) ubyte >0
>>>>>> (24.l+55) ubyte x
>>>>>>> &(&0.b+2) string x in "%s"
>
> Correlated with that file is the modification date and time in DOS
> format. So show that information by lines like
>>>> (24.l+14) lemsdosdate x \b, modified %s
>>>> (24.l+16) lemsdostime x %s
>
> After applying the above mentioned modifications by patch
> file-5.44-archive-zoo.diff then my ZOO samples are in principal
> described before, but now extract version is always shown and
> additional information like name, time-stamps of first stored file is
> also shown. Also some misidentification now vanish. So this now looks
> like:
>
> GRASPDOC.ZOO: Zoo archive data, v2.00,
> modify: v2.0+, extract: v2.1+,
> at 30599 258 bytes comment
> Anonymous ftp site garbo.uwasa.fi
> 128.214.87.1 moderated by Timo Salmi
> ts at chyde.uwasa.fi PC directories
> and uploads\015\012Harri Valkama
> hv at chyde.uwasa.fi PC, Mac, Unix
> files, and upload
> ; at 102 method=lzh
> , next entry at 29866, CRC 0xd6b9
> , size 111652 (29693 compressed)
> , grasp.man,
> modified Sun, May 19 1986 14:57:00
> LHARC_1_30.ZOO: Zoo archive data, v2.00,
> modify: v2.0+, extract: v1.0+
> ; at 42 method=not-compressed
> , next entry at 170, CRC 0x4d19
> , size 45 (45 compressed)
> , X.inf ".info"
> in "LHArc"
> , time zone 20/4,
> modified Sun, Dec 06 1990 10:45:58
> M2POSX02.ZOO: Zoo archive data, v2.10,
> modify: v2.0+, extract: v2.1+
> , vdata 0x3
> ; at 42 method=lzh
> , next entry at 844, CRC 0x7ccf
> , size 1264 (660 compressed)
> , m2ppx.g
> in "m2posix.02/bin"
> , time zone -4/4,
> modified Sun, Feb 21 1993 05:19:14
> MyZoo2-hP.bak: Zoo archive data, v2.10,
> modify: v2.0+, extract: v2.1+
> , vdata 0x3;
> at 42 method=lzh
> , next entry at 504, CRC 0x3cae
> , size 514 (302 compressed)
> , README.Deb "README.Debian"
> in "usr/share/doc/zoo"
> , time zone 0/4,
> modified Sun, Feb 15 2012 18:05:54
> MyZoo3.zoo: Zoo archive data, v2.10,
> modify: v2.0+, extract: v1.0+
> , vdata 0x3
> ; at 42 method=lzd
> , next entry at 594, CRC 0x3cae
> , size 514 (392 compressed), deleted
> , README.Deb "README.Debian"
> in "usr/share/doc/zoo"
> , time zone 0/4,
> modified Sun, Feb 15 2012 18:05:54
> UUCODE.ZOO: Zoo archive data, v1.50,
> modify: v1.4+, extract: v1.0+
> ; at 34 method=lzd
> , next entry at 10351, CRC 0x91a5
> , size 15272 (10256 compressed)
> , uudecode,
> modified Sun, Sep 20 1989 21:57:12
> WHRCGA.ZOO: Zoo archive data, v2.00,
> modify: v2.0+, extract: v2.1+
> , at 61369 258 bytes comment
> Anonymous ftp site garbo.uwasa.fi
> 128.214.87.1 moderated by Timo Salmi
> ts at chyde.uwasa.fi PC directories
> and uploads\015\012Harri Valkama
> hv at chyde.uwasa.fi PC, Mac, Unix
> files, and upload
> ; at 68 method=lzh
> , next entry at 44551, CRC 0x6f35
> , at 0xadd9 46 bytes comment
> "CGA .GL file showing menu
> input from keyboard"
> , size 160073 (44366 compressed)
> , whrcga.gl,
> modified Sun, Oct 07 1989 13:49:26
> playback.zoo: Zoo archive data, v2.10,
> modify: v2.0+, extract: v2.1+
> , vdata 0x3
> ; at 42 method=lzh
> , next entry at 33948, CRC 0x807c
> , size 68840 (33835 compressed)
> , playback.prg
> , time zone -107/4,
> modified Sun, Mar 25 1992 12:08:00
> unzip51b.zoo: Zoo archive data, v2.10,
> modify: v2.0+, extract: v2.1+
> , vdata 0x3
> ; at 42 method=lzh
> , next entry at 13606, CRC 0x1e50
> , size 22054 (13485 compressed)
> , funzip.ttp
> in "./68000"
> , time zone 4/4,
> modified Sun, Feb 09 1994 15:00:34
> x-fmt-269-signature-id-621.zoo: data
>
> Maybe that some exotic variants are missed ( tiny; one-file small
> archive). I hope my diff file can be applied in future version of
> file utility.
>
> I use two test functions lemsdosdate and lemsdostime to interpret 2
> byte value as bit encoded date and time in DOS format relative to
> year 1980, but these functions are not mentioned in the official
> documentation magic.man. So i think these 2 functions should be
> mentioned there.
>
> With best wishes,
> Jörg Jenderek
> - --
> Jörg Jenderek
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCZCtwdAAKCRCv8rHJQhrU
> 1owuAJ45XDMYM1yPl1qJTRoZJmgkTRoC1wCeKWzw+wY7ihPj82N65CRsSr6Gfq4=
> =SwXB
> -----END PGP SIGNATURE-----
> <trid-v-zoo.txt.gz><droid-zoo.csv.gz><file-5_44-archive-zoo_diff.DEFANGED-359><file-5_44-archive-zoo_diff_sig.DEFANGED-360>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20230417/c39c86d1/attachment.asc>
More information about the File
mailing list