[File] [PATCH] Magdir/zip ledate != DOS date + wrong "big" size
Christos Zoulas
christos at zoulas.com
Sun Oct 24 15:54:55 UTC 2021
Committed, thanks!
christos
> On Oct 23, 2021, at 10:18 AM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
>
> Hello,
>
> some times ago i handled some Mozilla omni.ja, which are a kind of
> ZIP archives. Some unexpected results lead me to inspection of ZIP
> archives.
> When running file command version 5.41 on such archives i get an
> output like:
>
> 1980-2021.zip: Zip archive data,
> at least v2.0 to extract,
> compression method=deflate
> 1980-jan-1-time0.zip: Zip archive data,
> at least v2.0 to extract,
> compression method=deflate
> 2021-sep-29-00.00.zip: Zip archive data,
> at least v2.0 to extract,
> compression method=deflate
>
> When running with -k option i get more messages. I can get the second
> messages by running file command with -m Magdir/zip option.
> So now i get an output like:
>
> 1980-2021.zip: Zip archive data,
> made by v2.0,
> extract using at least v2.0,
> last modified Mon Jan 26 08:26:40 1970,
> uncompressed size 31,
> method=deflate
> 1980-jan-1-time0.zip: Zip archive data,
> made by v2.0,
> extract using at least v2.0,
> last modified Mon Jan 26 01:18:56 1970,
> uncompressed size 29,
> method=deflate
> 2021-sep-29-00.00.zip: Zip archive data,
> made by v2.0,
> extract using at least v2.0,
> last modified Thu Apr 3 06:30:24 2014,
> uncompressed size 32,
> method=deflate
>
> Obviously the displayed time stamps for prepared archives are wrong!
>
> For comparison reason i run official Info-ZIP unzip tool. When i run
> it with verbose option in Zipinfo mode it reports the expected
> time-stamps (1980 Jan 1 and 2021 Sep 29; See appended unzip-date.txt)
>
> I also use decompression tool 7z with listing and zip type option.
> Here i got the same expected time-stamps (1980 Jan 1 and 2021 Sep 29;
> See appended 7z-l.txt)
>
> Inside Magdir/zip the information of Zip Central Directory record is
> shown by sub routine zipcd. After showing the minimum version extract
> by sub routine zipversion the modification date is displayed by
> line like:
> >>12 ledate x \b, last modified %s
> ledate is interpreting 4 byte value in little endian order as seconds
> since 1 January 1970 in local time.
> But according to documentation in ZIP archives date is stored in DOS
> time format, which is completely different. Start point is 1 January
> 1980 and date&time are stored in a bitmapped format. So date
> displaying lines are now become like:
>
> >>12 uleshort x \b, last modified
> >>12 use dos-date
>
> So i add this sub routine dos-date inside Magdir/msdos. First i add
> comment lines pointing to documentation like:
>
> # URL: http://fileformats.archiveteam.org
> # /wiki/MS-DOS_date/time
> # Reference: https://docs.microsoft.com/en-us/windows/win32/
> # /api/winbase/nf-winbase-dosdatetimetofiletime
> The first 2 bytes contain the time information and the 2 last bytes
> contain the date information. That can be shown in hexadecimal form
> by debugging lines like:
> >0 uleshort x RAW TIME %#4.4x
> >2 uleshort x RAW DATE %#4.4x
>
> According to documentation date is encoded in bit form like
> YYYYYMMMMDDDDD, where the lower D bits encode the days (1-31 range),
> the middle M bits encode the months (1-12 range) and the upper bits Y
> are the year part (+1980 to get real year). This is done by line like:
> >2 uleshort&0x001F x %u
> >2 uleshort&0x01E0 =0x0020 jan
> >2 uleshort&0x01E0 =0x0040 feb
> >2 uleshort&0x01E0 =0x0060 mar
> >2 uleshort&0x01E0 =0x0080 apr
> >2 uleshort&0x01E0 =0x00A0 may
> >2 uleshort&0x01E0 =0x00C0 jun
> >2 uleshort&0x01E0 =0x00E0 jul
> >2 uleshort&0x01E0 =0x0100 aug
> >2 uleshort&0x01E0 =0x0120 sep
> >2 uleshort&0x01E0 =0x0140 oct
> >2 uleshort&0x01E0 =0x0160 nov
> >2 uleshort&0x01E0 =0x0180 dec
> >2 uleshort/512 x 1980+%u
> Unfortunately i was not able to display time information like for the
> date.
>
> In documentation is written that all fields unless otherwise noted
> are unsigned stored in little endian order. The size in sub routine
> zipcd is displayed by line
> >>24 lelong >0 \b, uncompressed size %d
> That is definitely wrong. I have checked that for example
> 2015-05-05-raspbian-wheezy.zip. The size value is C3500000 in
> hexadecimal. Interpreting this as signed value gives wrong negative
> value -1018167296, whereas interpreting that value as unsigned gives
> the correct size 3276800000.
> And things are become worse. To overcome 4 GiB, the real size is
> stored as 8 byte integer inside ZIP64 format record and the 4 byte
> size value is set to maximal upper limit 0xFFFFFFFF. So the line for
> size in zipcd now becomes like:
> >>24 ulelong !0xFFffFFff \b, uncompressed size %u
>
> I assume the same error occur also lines in Magdir/zip like:
> #>4 leshort >1 \b, %d disks
> #>6 leshort >1 \b, central directory disk %d
> #>8 leshort >1 \b, %d central directories on this disk
> #>10 leshort >1 \b, %d central directories
> #>12 lelong x \b, %d central directory bytes
> So i changed this like:
> #>4 uleshort !0xFFff \b, %u disks
> #>6 uleshort !0xFFff \b, central directory disk %u
> #>8 uleshort !0xFFff \b, %u central directories on this disk
> #>10 uleshort !0xFFff \b, %u central directories
> #>12 ulelong !0xFFffFFff \b, %u central directory bytes
> But i did not check this by examples.
>
> Obviously this error is not visible because zipcd is only called
> after checking for EOCD (End Of Central Directory record) at the end
> of archive and this is not done for "big" files because of size
> limitations of file command.
>
> The first identification is done by looking for local file header
> with start pattern PK\3\4 inside Magdir/archive.
>
> The second message is done by looking for Central directory file
> header with pattern PK\1\2 by sub routine zipcd inside Magdir/zip.
>
> In principal both report the same information with one exception, the
> first has no "made by" version part. So when first test succeeds, the
> second test must not be executed.
>
> But when both are executed the word phrase should be the same,
> because both refers to same information. So in Magdir/zip minimum
> version to extract is shown by lines like:
> >>6 leshort x \b, extract using at least
> >>6 use zipversion
> But inside Magdir/archive (version 1.151) this called "at least"
> zipversion "to extract"
>
> The compression method in Magdir/zip is displayed by lines like:
> >>10 leshort x \b, method=
> >>10 use zipcompression
> whereas inside Magdir/archive (version 1.151) this is called
> "compression method="zipcompression.
> I do not change these lines but add comment lines with vice versa
> expressions of Magdir/archive.
>
> After applying the above mentioned modifications by patch
> file-5.41-zip-time.diff file-5.41-msdos-time.diff then now for all
> inspected ZIP archives the time stamps are now displayed correctly
> (but ugly). This now looks like:
>
> 1980-2021.zip: Zip archive data,
> made by v2.0,
> extract using at least v2.0,
> last modified 1 jan 1980+0,
> uncompressed size 31,
> method=deflate
> 1980-jan-1-time0.zip: Zip archive data,
> made by v2.0,
> extract using at least v2.0,
> last modified 1 jan 1980+0,
> uncompressed size 29,
> method=deflate
> 2021-sep-29-00.00.zip: Zip archive data,
> made by v2.0,
> extract using at least v2.0,
> last modified 29 sep 1980+41,
> uncompressed size 32,
> method=deflate
>
> I hope my 2 diff files can be applied in future version of file utility.
>
> There are some things to-do.
> First to create an equivalent for dos-time in C to speed up things
> and to get a similar look like by function ledate.
> Zip archives and derivates are also handled by Magdir/archive. There
> probably the same error for time and size are manifested. I tried to
> update this magic file, but things are complicated because i also
> want to add some zip variants like:
> description extension
> DROID profile droid
> Android Package apk
> Mozilla cross platform installer module xpi
> LibreOffice Extension oxt
> Sweet Home 3D design sh3d
> Compressed Disk Image imz
> Microsoft Open XML Paper Specification xps
> Microsoft Open XML Paper Specification oxps
>
> But i do not succeed and when looking in TrID database for ZIP magic
> by XML expression "<Bytes>504B0304</Bytes>" i found 375 file types.
>
> With best wishes
> Jörg Jenderek
> --
> Jörg Jenderek
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> <file-5_41-msdos-time_diff.DEFANGED-22><file-5_41-msdos-time_diff_sig.DEFANGED-23><file-5_41-zip-time_diff.DEFANGED-24><file-5_41-zip-time_diff_sig.DEFANGED-25><unzip-date.txt.gz><7z-l.txt.gz>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20211024/91b8c831/attachment.asc>
More information about the File
mailing list