[File] [PATCH] Magdir/zip ledate != DOS date + wrong "big" size

Christos Zoulas christos at zoulas.com
Sun Oct 24 15:54:55 UTC 2021


Committed, thanks!

christos

> On Oct 23, 2021, at 10:18 AM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
> 
> Hello,
> 
> some times ago i handled some Mozilla omni.ja, which are a kind of
> ZIP archives. Some unexpected results lead me to inspection of ZIP
> archives.
> When running file command version 5.41 on such archives i get an
> output like:
> 
> 1980-2021.zip:         Zip archive data,
> 		       at least v2.0 to extract,
> 		       compression method=deflate
> 1980-jan-1-time0.zip:  Zip archive data,
> 		       at least v2.0 to extract,
> 		       compression method=deflate
> 2021-sep-29-00.00.zip: Zip archive data,
> 		       at least v2.0 to extract,
> 		       compression method=deflate
> 
> When running with -k option i get more messages. I can get the second
> messages by running file command with -m Magdir/zip option.
> So now i get an output like:
> 
> 1980-2021.zip:         Zip archive data,
> 		       made by v2.0,
> 		       extract using at least v2.0,
> 		       last modified Mon Jan 26 08:26:40 1970,
> 		       uncompressed size 31,
> 		       method=deflate
> 1980-jan-1-time0.zip:  Zip archive data,
> 		       made by v2.0,
> 		       extract using at least v2.0,
> 		       last modified Mon Jan 26 01:18:56 1970,
> 		       uncompressed size 29,
> 		       method=deflate
> 2021-sep-29-00.00.zip: Zip archive data,
> 		       made by v2.0,
> 		       extract using at least v2.0,
> 		       last modified Thu Apr  3 06:30:24 2014,
> 		       uncompressed size 32,
> 		       method=deflate
> 
> Obviously the displayed time stamps for prepared archives are wrong!
> 
> For comparison reason i run official Info-ZIP unzip tool. When i run
> it with verbose option in Zipinfo mode it reports the expected
> time-stamps (1980 Jan 1 and 2021 Sep 29; See appended unzip-date.txt)
> 
> I also use decompression tool 7z with listing and zip type option.
> Here i got the same expected time-stamps (1980 Jan 1 and 2021 Sep 29;
> See appended 7z-l.txt)
> 
> Inside Magdir/zip the information of Zip Central Directory record is
> shown by sub routine zipcd. After showing the minimum version extract
>     by sub routine zipversion the modification date is displayed by
> line like:
> >>12	ledate		x		\b, last modified %s
> ledate is interpreting 4 byte value in little endian order as seconds
> since 1 January 1970 in local time.
> But according to documentation in ZIP archives date is stored in DOS
> time format, which is completely different. Start point is 1 January
> 1980 and date&time are stored in a bitmapped format. So date
> displaying lines are now become like:
> 
> >>12	uleshort	x		\b, last modified
> >>12	use		dos-date
> 
> So i add this sub routine dos-date inside Magdir/msdos. First i add
> comment lines pointing to documentation like:
> 
> # URL:		http://fileformats.archiveteam.org
> #		/wiki/MS-DOS_date/time
> # Reference:	https://docs.microsoft.com/en-us/windows/win32/
> #		/api/winbase/nf-winbase-dosdatetimetofiletime
> The first 2 bytes contain the time information and the 2 last bytes
> contain the date information. That can be shown in hexadecimal form
> by debugging lines like:
> >0	uleshort	x	RAW TIME %#4.4x
> >2	uleshort	x	RAW DATE %#4.4x
> 
> According to documentation date is encoded in bit form like
> YYYYYMMMMDDDDD, where the lower D bits encode the days (1-31 range),
> the middle M bits encode the months (1-12 range) and the upper bits Y
> are the year part (+1980 to get real year). This is done by line like:
> >2	uleshort&0x001F	x	%u
> >2	uleshort&0x01E0	=0x0020	jan
> >2	uleshort&0x01E0	=0x0040	feb
> >2	uleshort&0x01E0	=0x0060	mar
> >2	uleshort&0x01E0	=0x0080	apr
> >2	uleshort&0x01E0	=0x00A0	may
> >2	uleshort&0x01E0	=0x00C0	jun
> >2	uleshort&0x01E0	=0x00E0	jul
> >2	uleshort&0x01E0	=0x0100	aug
> >2	uleshort&0x01E0	=0x0120	sep
> >2	uleshort&0x01E0	=0x0140	oct
> >2	uleshort&0x01E0	=0x0160	nov
> >2	uleshort&0x01E0	=0x0180	dec
> >2	uleshort/512	x	1980+%u
> Unfortunately i was not able to display time information like for the
> date.
> 
> In documentation is written that all fields unless otherwise noted
> are unsigned stored in little endian order. The size in sub routine
> zipcd is displayed by line
> >>24	lelong		>0		\b, uncompressed size %d
> That is definitely wrong. I have checked that for example
> 2015-05-05-raspbian-wheezy.zip. The size value is C3500000 in
> hexadecimal. Interpreting this as signed value gives wrong negative
> value -1018167296, whereas interpreting that value as unsigned gives
> the correct size 3276800000.
> And things are become worse. To overcome 4 GiB, the real size is
> stored as 8 byte integer inside ZIP64 format record and the 4 byte
> size value is set to maximal upper limit 0xFFFFFFFF. So the line for
> size in zipcd now becomes like:
> >>24	ulelong		!0xFFffFFff	\b, uncompressed size %u
> 
> I assume the same error occur also lines in Magdir/zip like:
> #>4	leshort	>1	\b, %d disks
> #>6	leshort	>1	\b, central directory disk %d
> #>8	leshort	>1	\b, %d central directories on this disk
> #>10	leshort	>1	\b, %d central directories
> #>12	lelong	x	\b, %d central directory bytes
> So i changed this like:
> #>4	uleshort !0xFFff \b, %u disks
> #>6	uleshort !0xFFff \b, central directory disk %u
> #>8	uleshort !0xFFff \b, %u central directories on this disk
> #>10	uleshort !0xFFff \b, %u central directories
> #>12	ulelong	!0xFFffFFff \b, %u central directory bytes
> But i did not check this by examples.
> 
> Obviously this error is not visible because zipcd is only called
> after checking for EOCD (End Of Central Directory record) at the end
> of archive and this is not done for "big" files because of size
> limitations of file command.
> 
> The first identification is done by looking for local file header
> with start pattern PK\3\4 inside Magdir/archive.
> 
> The second message is done by looking for Central directory file
> header with pattern PK\1\2 by sub routine zipcd inside Magdir/zip.
> 
> In principal both report the same information with one exception, the
> first has no "made by" version part. So when first test succeeds, the
> second test must not be executed.
> 
> But when both are executed the word phrase should be the same,
> because both refers to same information. So in Magdir/zip minimum
> version to extract is shown by lines like:
> >>6	leshort		x		\b, extract using at least
> >>6	use		zipversion
> But inside Magdir/archive (version 1.151) this called "at least"
> zipversion "to extract"
> 
> The compression method in Magdir/zip is displayed by lines like:
> >>10	leshort		x		\b, method=
> >>10	use		zipcompression
> whereas inside Magdir/archive (version 1.151) this is called
> "compression method="zipcompression.
> I do not change these lines but add comment lines with vice versa
> expressions of Magdir/archive.
> 
> After applying the above mentioned modifications by patch
> file-5.41-zip-time.diff file-5.41-msdos-time.diff then now for all
> inspected ZIP archives the time stamps are now displayed correctly
> (but ugly). This now looks like:
> 
> 1980-2021.zip:         Zip archive data,
> 		       made by v2.0,
> 		       extract using at least v2.0,
> 		       last modified 1 jan 1980+0,
> 		       uncompressed size 31,
> 		       method=deflate
> 1980-jan-1-time0.zip:  Zip archive data,
> 		       made by v2.0,
> 		       extract using at least v2.0,
> 		       last modified 1 jan 1980+0,
> 		       uncompressed size 29,
> 		       method=deflate
> 2021-sep-29-00.00.zip: Zip archive data,
> 		       made by v2.0,
> 		       extract using at least v2.0,
> 		       last modified 29 sep 1980+41,
> 		       uncompressed size 32,
> 		       method=deflate
> 
> I hope my 2 diff files can be applied in future version of file utility.
> 
> There are some things to-do.
> First to create an equivalent for dos-time in C to speed up things
> and to get a similar look like by function ledate.
> Zip archives and derivates are also handled by Magdir/archive. There
> probably the same error for time and size are manifested. I tried to
> update this magic file, but things are complicated because i also
> want to add some zip variants like:
> description				extension
> DROID profile				droid
> Android Package				apk
> Mozilla cross platform installer module	xpi
> LibreOffice Extension			oxt
> Sweet Home 3D design			sh3d
> Compressed Disk Image			imz
> Microsoft Open XML Paper Specification	xps
> Microsoft Open XML Paper Specification	oxps
> 
> But i do not succeed and when looking in TrID database for ZIP magic
> by XML expression "<Bytes>504B0304</Bytes>" i found 375 file types.
> 
> With best wishes
> Jörg Jenderek
> --
> Jörg Jenderek
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> <file-5_41-msdos-time_diff.DEFANGED-22><file-5_41-msdos-time_diff_sig.DEFANGED-23><file-5_41-zip-time_diff.DEFANGED-24><file-5_41-zip-time_diff_sig.DEFANGED-25><unzip-date.txt.gz><7z-l.txt.gz>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20211024/91b8c831/attachment.asc>


More information about the File mailing list