[File] [PATCH] Magdir/zip ledate != DOS date + wrong "big" size
Jörg Jenderek
joerg.jen.der.ek at gmx.net
Sat Oct 23 14:18:54 UTC 2021
Hello,
some times ago i handled some Mozilla omni.ja, which are a kind of
ZIP archives. Some unexpected results lead me to inspection of ZIP
archives.
When running file command version 5.41 on such archives i get an
output like:
1980-2021.zip: Zip archive data,
at least v2.0 to extract,
compression method=deflate
1980-jan-1-time0.zip: Zip archive data,
at least v2.0 to extract,
compression method=deflate
2021-sep-29-00.00.zip: Zip archive data,
at least v2.0 to extract,
compression method=deflate
When running with -k option i get more messages. I can get the second
messages by running file command with -m Magdir/zip option.
So now i get an output like:
1980-2021.zip: Zip archive data,
made by v2.0,
extract using at least v2.0,
last modified Mon Jan 26 08:26:40 1970,
uncompressed size 31,
method=deflate
1980-jan-1-time0.zip: Zip archive data,
made by v2.0,
extract using at least v2.0,
last modified Mon Jan 26 01:18:56 1970,
uncompressed size 29,
method=deflate
2021-sep-29-00.00.zip: Zip archive data,
made by v2.0,
extract using at least v2.0,
last modified Thu Apr 3 06:30:24 2014,
uncompressed size 32,
method=deflate
Obviously the displayed time stamps for prepared archives are wrong!
For comparison reason i run official Info-ZIP unzip tool. When i run
it with verbose option in Zipinfo mode it reports the expected
time-stamps (1980 Jan 1 and 2021 Sep 29; See appended unzip-date.txt)
I also use decompression tool 7z with listing and zip type option.
Here i got the same expected time-stamps (1980 Jan 1 and 2021 Sep 29;
See appended 7z-l.txt)
Inside Magdir/zip the information of Zip Central Directory record is
shown by sub routine zipcd. After showing the minimum version extract
by sub routine zipversion the modification date is displayed by
line like:
>>12 ledate x \b, last modified %s
ledate is interpreting 4 byte value in little endian order as seconds
since 1 January 1970 in local time.
But according to documentation in ZIP archives date is stored in DOS
time format, which is completely different. Start point is 1 January
1980 and date&time are stored in a bitmapped format. So date
displaying lines are now become like:
>>12 uleshort x \b, last modified
>>12 use dos-date
So i add this sub routine dos-date inside Magdir/msdos. First i add
comment lines pointing to documentation like:
# URL: http://fileformats.archiveteam.org
# /wiki/MS-DOS_date/time
# Reference: https://docs.microsoft.com/en-us/windows/win32/
# /api/winbase/nf-winbase-dosdatetimetofiletime
The first 2 bytes contain the time information and the 2 last bytes
contain the date information. That can be shown in hexadecimal form
by debugging lines like:
>0 uleshort x RAW TIME %#4.4x
>2 uleshort x RAW DATE %#4.4x
According to documentation date is encoded in bit form like
YYYYYMMMMDDDDD, where the lower D bits encode the days (1-31 range),
the middle M bits encode the months (1-12 range) and the upper bits Y
are the year part (+1980 to get real year). This is done by line like:
>2 uleshort&0x001F x %u
>2 uleshort&0x01E0 =0x0020 jan
>2 uleshort&0x01E0 =0x0040 feb
>2 uleshort&0x01E0 =0x0060 mar
>2 uleshort&0x01E0 =0x0080 apr
>2 uleshort&0x01E0 =0x00A0 may
>2 uleshort&0x01E0 =0x00C0 jun
>2 uleshort&0x01E0 =0x00E0 jul
>2 uleshort&0x01E0 =0x0100 aug
>2 uleshort&0x01E0 =0x0120 sep
>2 uleshort&0x01E0 =0x0140 oct
>2 uleshort&0x01E0 =0x0160 nov
>2 uleshort&0x01E0 =0x0180 dec
>2 uleshort/512 x 1980+%u
Unfortunately i was not able to display time information like for the
date.
In documentation is written that all fields unless otherwise noted
are unsigned stored in little endian order. The size in sub routine
zipcd is displayed by line
>>24 lelong >0 \b, uncompressed size %d
That is definitely wrong. I have checked that for example
2015-05-05-raspbian-wheezy.zip. The size value is C3500000 in
hexadecimal. Interpreting this as signed value gives wrong negative
value -1018167296, whereas interpreting that value as unsigned gives
the correct size 3276800000.
And things are become worse. To overcome 4 GiB, the real size is
stored as 8 byte integer inside ZIP64 format record and the 4 byte
size value is set to maximal upper limit 0xFFFFFFFF. So the line for
size in zipcd now becomes like:
>>24 ulelong !0xFFffFFff \b, uncompressed size %u
I assume the same error occur also lines in Magdir/zip like:
#>4 leshort >1 \b, %d disks
#>6 leshort >1 \b, central directory disk %d
#>8 leshort >1 \b, %d central directories on this disk
#>10 leshort >1 \b, %d central directories
#>12 lelong x \b, %d central directory bytes
So i changed this like:
#>4 uleshort !0xFFff \b, %u disks
#>6 uleshort !0xFFff \b, central directory disk %u
#>8 uleshort !0xFFff \b, %u central directories on this disk
#>10 uleshort !0xFFff \b, %u central directories
#>12 ulelong !0xFFffFFff \b, %u central directory bytes
But i did not check this by examples.
Obviously this error is not visible because zipcd is only called
after checking for EOCD (End Of Central Directory record) at the end
of archive and this is not done for "big" files because of size
limitations of file command.
The first identification is done by looking for local file header
with start pattern PK\3\4 inside Magdir/archive.
The second message is done by looking for Central directory file
header with pattern PK\1\2 by sub routine zipcd inside Magdir/zip.
In principal both report the same information with one exception, the
first has no "made by" version part. So when first test succeeds, the
second test must not be executed.
But when both are executed the word phrase should be the same,
because both refers to same information. So in Magdir/zip minimum
version to extract is shown by lines like:
>>6 leshort x \b, extract using at least
>>6 use zipversion
But inside Magdir/archive (version 1.151) this called "at least"
zipversion "to extract"
The compression method in Magdir/zip is displayed by lines like:
>>10 leshort x \b, method=
>>10 use zipcompression
whereas inside Magdir/archive (version 1.151) this is called
"compression method="zipcompression.
I do not change these lines but add comment lines with vice versa
expressions of Magdir/archive.
After applying the above mentioned modifications by patch
file-5.41-zip-time.diff file-5.41-msdos-time.diff then now for all
inspected ZIP archives the time stamps are now displayed correctly
(but ugly). This now looks like:
1980-2021.zip: Zip archive data,
made by v2.0,
extract using at least v2.0,
last modified 1 jan 1980+0,
uncompressed size 31,
method=deflate
1980-jan-1-time0.zip: Zip archive data,
made by v2.0,
extract using at least v2.0,
last modified 1 jan 1980+0,
uncompressed size 29,
method=deflate
2021-sep-29-00.00.zip: Zip archive data,
made by v2.0,
extract using at least v2.0,
last modified 29 sep 1980+41,
uncompressed size 32,
method=deflate
I hope my 2 diff files can be applied in future version of file utility.
There are some things to-do.
First to create an equivalent for dos-time in C to speed up things
and to get a similar look like by function ledate.
Zip archives and derivates are also handled by Magdir/archive. There
probably the same error for time and size are manifested. I tried to
update this magic file, but things are complicated because i also
want to add some zip variants like:
description extension
DROID profile droid
Android Package apk
Mozilla cross platform installer module xpi
LibreOffice Extension oxt
Sweet Home 3D design sh3d
Compressed Disk Image imz
Microsoft Open XML Paper Specification xps
Microsoft Open XML Paper Specification oxps
But i do not succeed and when looking in TrID database for ZIP magic
by XML expression "<Bytes>504B0304</Bytes>" i found 375 file types.
With best wishes
Jörg Jenderek
--
Jörg Jenderek
-------------- next part --------------
--- file-5.41/magic/Magdir/msdos.old 2021-10-18 14:20:03 +0000
+++ file-5.41/magic/Magdir/msdos 2021-10-23 13:08:25 +0000
@@ -1765,3 +1765,37 @@
# NB: The BACKUP.nnn files consist of the files backed up,
# concatenated.
+
+# From: Joerg Jenderek
+# URL: http://fileformats.archiveteam.org/wiki/MS-DOS_date/time
+# Reference: https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-dosdatetimetofiletime
+# Note: DOS date+time format is different from formats such as Unix epoch
+# bit encoded; uses year values relative to 1980 and 2 second precision
+0 name dos-date
+# HHHHHMMMMMMSSSSS bit encoded Hour (0-23) Minute (0-59) SecondPart (*2)
+#>0 uleshort x RAW TIME %#4.4x
+# hour part
+#>0 uleshort/2048 x hour %u
+# YYYYYMMMMDDDDD bit encoded YearPart (+1980) Month (1-12) Day (1-31)
+#>2 uleshort x RAW DATE %#4.4x
+# day part
+>2 uleshort&0x001F x %u
+#>2 uleshort/16 x MONTH PART %#x
+# GRR: not working
+#>2 uleshort/16 &0x000F MONTH %u
+#>2 uleshort&0x01E0 x MONTH PART %#4.4x
+>2 uleshort&0x01E0 =0x0020 jan
+>2 uleshort&0x01E0 =0x0040 feb
+>2 uleshort&0x01E0 =0x0060 mar
+>2 uleshort&0x01E0 =0x0080 apr
+>2 uleshort&0x01E0 =0x00A0 may
+>2 uleshort&0x01E0 =0x00C0 jun
+>2 uleshort&0x01E0 =0x00E0 jul
+>2 uleshort&0x01E0 =0x0100 aug
+>2 uleshort&0x01E0 =0x0120 sep
+>2 uleshort&0x01E0 =0x0140 oct
+>2 uleshort&0x01E0 =0x0160 nov
+>2 uleshort&0x01E0 =0x0180 dec
+# year part
+>2 uleshort/512 x 1980+%u
+#
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.41-msdos-time.diff.sig
Type: application/octet-stream
Size: 866 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20211023/d6f45b97/attachment.obj>
-------------- next part --------------
--- file-5.41/magic/Magdir/zip.old 2021-05-12 16:30:24 +0000
+++ file-5.41/magic/Magdir/zip 2021-10-23 13:20:39 +0000
@@ -5,2 +5,3 @@
# just an example until negative offsets are supported better
+# Note: All fields unless otherwise noted are unsigned!
@@ -10,2 +11,3 @@
!:mime application/zip
+# no "made by" in local file header with PK\3\4 magic
>>4 leshort x \b, made by
@@ -13,6 +15,11 @@
>>4 use ziphost
+# inside ./archive 1.151 called "at least" zipversion "to extract"
>>6 leshort x \b, extract using at least
>>6 use zipversion
->>12 ledate x \b, last modified %s
->>24 lelong >0 \b, uncompressed size %d
+# This is DOS date like: ledate 21:00:48 19 Dec 2001 != DOS 00:00 1 Jan 2010 ~ 0000213C
+>>12 ulelong x \b, last modified
+>>12 use dos-date
+# uncompressed size of 1st entry; FFffFFff means real value stored in ZIP64 record
+>>24 ulelong !0xFFffFFff \b, uncompressed size %u
+# inside ./archive 1.151 called "compression method="zipcompression
>>10 leshort x \b, method=
@@ -104,9 +111,15 @@
# Zip End Of Central Directory record
+# GRR: wrong for ZIP with comment archive
-22 string PK\005\006
-#>4 leshort >1 \b, %d disks
-#>6 leshort >1 \b, central directory disk %d
-#>8 leshort >1 \b, %d central directories on this disk
-#>10 leshort >1 \b, %d central directories
-#>12 lelong x \b, %d central directory bytes
+#>4 uleshort !0xFFff \b, %u disks
+#>6 uleshort !0xFFff \b, central directory disk %u
+#>8 uleshort !0xFFff \b, %u central directories on this disk
+#>10 uleshort !0xFFff \b, %u central directories
+#>12 ulelong !0xFFffFFff \b, %u central directory bytes
+# offset of central directory
+#>16 ulelong x \b, central directory offset %#x
>(16.l) use zipcd
+# archive comment length n
+#>>20 uleshort >0 \b, comment length %u
+# archive comment
>>20 pstring/l >0 \b, %s
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.41-zip-time.diff.sig
Type: application/octet-stream
Size: 987 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20211023/d6f45b97/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: unzip-date.txt.gz
Type: application/x-gzip
Size: 1063 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20211023/d6f45b97/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 7z-l.txt.gz
Type: application/x-gzip
Size: 442 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20211023/d6f45b97/attachment-0001.bin>
More information about the File
mailing list