[File] [PATCH] Magdir/zip ledate != DOS date + wrong "big" size

Jörg Jenderek joerg.jen.der.ek at gmx.net
Sat Oct 23 14:18:54 UTC 2021


Hello,

some times ago i handled some Mozilla omni.ja, which are a kind of
ZIP archives. Some unexpected results lead me to inspection of ZIP
archives.
When running file command version 5.41 on such archives i get an
output like:

1980-2021.zip:         Zip archive data,
		       at least v2.0 to extract,
		       compression method=deflate
1980-jan-1-time0.zip:  Zip archive data,
		       at least v2.0 to extract,
		       compression method=deflate
2021-sep-29-00.00.zip: Zip archive data,
		       at least v2.0 to extract,
		       compression method=deflate

When running with -k option i get more messages. I can get the second
messages by running file command with -m Magdir/zip option.
So now i get an output like:

1980-2021.zip:         Zip archive data,
		       made by v2.0,
		       extract using at least v2.0,
		       last modified Mon Jan 26 08:26:40 1970,
		       uncompressed size 31,
		       method=deflate
1980-jan-1-time0.zip:  Zip archive data,
		       made by v2.0,
		       extract using at least v2.0,
		       last modified Mon Jan 26 01:18:56 1970,
		       uncompressed size 29,
		       method=deflate
2021-sep-29-00.00.zip: Zip archive data,
		       made by v2.0,
		       extract using at least v2.0,
		       last modified Thu Apr  3 06:30:24 2014,
		       uncompressed size 32,
		       method=deflate

Obviously the displayed time stamps for prepared archives are wrong!

For comparison reason i run official Info-ZIP unzip tool. When i run
it with verbose option in Zipinfo mode it reports the expected
time-stamps (1980 Jan 1 and 2021 Sep 29; See appended unzip-date.txt)

I also use decompression tool 7z with listing and zip type option.
Here i got the same expected time-stamps (1980 Jan 1 and 2021 Sep 29;
See appended 7z-l.txt)

Inside Magdir/zip the information of Zip Central Directory record is
shown by sub routine zipcd. After showing the minimum version extract
      by sub routine zipversion the modification date is displayed by
line like:
  >>12	ledate		x		\b, last modified %s
ledate is interpreting 4 byte value in little endian order as seconds
since 1 January 1970 in local time.
But according to documentation in ZIP archives date is stored in DOS
time format, which is completely different. Start point is 1 January
1980 and date&time are stored in a bitmapped format. So date
displaying lines are now become like:

  >>12	uleshort	x		\b, last modified
  >>12	use		dos-date

So i add this sub routine dos-date inside Magdir/msdos. First i add
comment lines pointing to documentation like:

# URL:		http://fileformats.archiveteam.org
#		/wiki/MS-DOS_date/time
# Reference:	https://docs.microsoft.com/en-us/windows/win32/
#		/api/winbase/nf-winbase-dosdatetimetofiletime
The first 2 bytes contain the time information and the 2 last bytes
contain the date information. That can be shown in hexadecimal form
by debugging lines like:
  >0	uleshort	x	RAW TIME %#4.4x
  >2	uleshort	x	RAW DATE %#4.4x

According to documentation date is encoded in bit form like
YYYYYMMMMDDDDD, where the lower D bits encode the days (1-31 range),
the middle M bits encode the months (1-12 range) and the upper bits Y
are the year part (+1980 to get real year). This is done by line like:
  >2	uleshort&0x001F	x	%u
  >2	uleshort&0x01E0	=0x0020	jan
  >2	uleshort&0x01E0	=0x0040	feb
  >2	uleshort&0x01E0	=0x0060	mar
  >2	uleshort&0x01E0	=0x0080	apr
  >2	uleshort&0x01E0	=0x00A0	may
  >2	uleshort&0x01E0	=0x00C0	jun
  >2	uleshort&0x01E0	=0x00E0	jul
  >2	uleshort&0x01E0	=0x0100	aug
  >2	uleshort&0x01E0	=0x0120	sep
  >2	uleshort&0x01E0	=0x0140	oct
  >2	uleshort&0x01E0	=0x0160	nov
  >2	uleshort&0x01E0	=0x0180	dec
  >2	uleshort/512	x	1980+%u
Unfortunately i was not able to display time information like for the
date.

In documentation is written that all fields unless otherwise noted
are unsigned stored in little endian order. The size in sub routine
zipcd is displayed by line
  >>24	lelong		>0		\b, uncompressed size %d
That is definitely wrong. I have checked that for example
2015-05-05-raspbian-wheezy.zip. The size value is C3500000 in
hexadecimal. Interpreting this as signed value gives wrong negative
value -1018167296, whereas interpreting that value as unsigned gives
the correct size 3276800000.
And things are become worse. To overcome 4 GiB, the real size is
stored as 8 byte integer inside ZIP64 format record and the 4 byte
size value is set to maximal upper limit 0xFFFFFFFF. So the line for
size in zipcd now becomes like:
  >>24	ulelong		!0xFFffFFff	\b, uncompressed size %u

I assume the same error occur also lines in Magdir/zip like:
#>4	leshort	>1	\b, %d disks
#>6	leshort	>1	\b, central directory disk %d
#>8	leshort	>1	\b, %d central directories on this disk
#>10	leshort	>1	\b, %d central directories
#>12	lelong	x	\b, %d central directory bytes
So i changed this like:
#>4	uleshort !0xFFff \b, %u disks
#>6	uleshort !0xFFff \b, central directory disk %u
#>8	uleshort !0xFFff \b, %u central directories on this disk
#>10	uleshort !0xFFff \b, %u central directories
#>12	ulelong	!0xFFffFFff \b, %u central directory bytes
But i did not check this by examples.

Obviously this error is not visible because zipcd is only called
after checking for EOCD (End Of Central Directory record) at the end
of archive and this is not done for "big" files because of size
limitations of file command.

The first identification is done by looking for local file header
with start pattern PK\3\4 inside Magdir/archive.

The second message is done by looking for Central directory file
header with pattern PK\1\2 by sub routine zipcd inside Magdir/zip.

In principal both report the same information with one exception, the
first has no "made by" version part. So when first test succeeds, the
second test must not be executed.

But when both are executed the word phrase should be the same,
because both refers to same information. So in Magdir/zip minimum
version to extract is shown by lines like:
  >>6	leshort		x		\b, extract using at least
  >>6	use		zipversion
But inside Magdir/archive (version 1.151) this called "at least"
zipversion "to extract"

The compression method in Magdir/zip is displayed by lines like:
  >>10	leshort		x		\b, method=
  >>10	use		zipcompression
whereas inside Magdir/archive (version 1.151) this is called
"compression method="zipcompression.
I do not change these lines but add comment lines with vice versa
expressions of Magdir/archive.

After applying the above mentioned modifications by patch
file-5.41-zip-time.diff file-5.41-msdos-time.diff then now for all
inspected ZIP archives the time stamps are now displayed correctly
(but ugly). This now looks like:

1980-2021.zip:         Zip archive data,
		       made by v2.0,
		       extract using at least v2.0,
		       last modified 1 jan 1980+0,
		       uncompressed size 31,
		       method=deflate
1980-jan-1-time0.zip:  Zip archive data,
		       made by v2.0,
		       extract using at least v2.0,
		       last modified 1 jan 1980+0,
		       uncompressed size 29,
		       method=deflate
2021-sep-29-00.00.zip: Zip archive data,
		       made by v2.0,
		       extract using at least v2.0,
		       last modified 29 sep 1980+41,
		       uncompressed size 32,
		       method=deflate

I hope my 2 diff files can be applied in future version of file utility.

There are some things to-do.
First to create an equivalent for dos-time in C to speed up things
and to get a similar look like by function ledate.
Zip archives and derivates are also handled by Magdir/archive. There
probably the same error for time and size are manifested. I tried to
update this magic file, but things are complicated because i also
want to add some zip variants like:
description				extension
DROID profile				droid
Android Package				apk
Mozilla cross platform installer module	xpi
LibreOffice Extension			oxt
Sweet Home 3D design			sh3d
Compressed Disk Image			imz
Microsoft Open XML Paper Specification	xps
Microsoft Open XML Paper Specification	oxps

But i do not succeed and when looking in TrID database for ZIP magic
by XML expression "<Bytes>504B0304</Bytes>" i found 375 file types.

With best wishes
Jörg Jenderek
--
Jörg Jenderek






















































-------------- next part --------------
--- file-5.41/magic/Magdir/msdos.old	2021-10-18 14:20:03 +0000
+++ file-5.41/magic/Magdir/msdos	2021-10-23 13:08:25 +0000
@@ -1765,3 +1765,37 @@
 
 # NB: The BACKUP.nnn files consist of the files backed up,
 # concatenated.
+
+# From:		Joerg Jenderek
+# URL:		http://fileformats.archiveteam.org/wiki/MS-DOS_date/time
+# Reference:	https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-dosdatetimetofiletime
+# Note:		DOS date+time format is different from formats such as Unix epoch
+#		bit encoded; uses year values relative to 1980 and 2 second precision
+0	name		dos-date
+# HHHHHMMMMMMSSSSS bit encoded Hour (0-23) Minute (0-59) SecondPart (*2)
+#>0	uleshort	x	RAW TIME %#4.4x
+# hour part
+#>0	uleshort/2048	x	hour %u
+# YYYYYMMMMDDDDD bit encoded YearPart (+1980) Month (1-12) Day (1-31)
+#>2	uleshort	x	RAW DATE %#4.4x
+# day part
+>2	uleshort&0x001F	x	%u
+#>2	uleshort/16	x	MONTH PART %#x
+# GRR: not working
+#>2	uleshort/16	&0x000F	MONTH %u
+#>2	uleshort&0x01E0	x	MONTH PART %#4.4x
+>2	uleshort&0x01E0	=0x0020	jan
+>2	uleshort&0x01E0	=0x0040	feb
+>2	uleshort&0x01E0	=0x0060	mar
+>2	uleshort&0x01E0	=0x0080	apr
+>2	uleshort&0x01E0	=0x00A0	may
+>2	uleshort&0x01E0	=0x00C0	jun
+>2	uleshort&0x01E0	=0x00E0	jul
+>2	uleshort&0x01E0	=0x0100	aug
+>2	uleshort&0x01E0	=0x0120	sep
+>2	uleshort&0x01E0	=0x0140	oct
+>2	uleshort&0x01E0	=0x0160	nov
+>2	uleshort&0x01E0	=0x0180	dec
+# year part
+>2	uleshort/512	x	1980+%u
+#
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.41-msdos-time.diff.sig
Type: application/octet-stream
Size: 866 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20211023/d6f45b97/attachment.obj>
-------------- next part --------------
--- file-5.41/magic/Magdir/zip.old	2021-05-12 16:30:24 +0000
+++ file-5.41/magic/Magdir/zip	2021-10-23 13:20:39 +0000
@@ -5,2 +5,3 @@
 # just an example until negative offsets are supported better
+# Note:	All fields unless otherwise noted are unsigned!
 
@@ -10,2 +11,3 @@
 !:mime	application/zip
+# no "made by" in local file header with PK\3\4 magic
 >>4	leshort		x		\b, made by
@@ -13,6 +15,11 @@
 >>4	use		ziphost
+# inside ./archive 1.151 called "at least" zipversion "to extract"
 >>6	leshort		x		\b, extract using at least
 >>6	use		zipversion
->>12	ledate		x		\b, last modified %s
->>24	lelong		>0		\b, uncompressed size %d
+# This is DOS date like: ledate 21:00:48 19 Dec 2001 != DOS 00:00 1 Jan 2010 ~ 0000213C
+>>12	ulelong		x		\b, last modified
+>>12	use		dos-date
+# uncompressed size of 1st entry; FFffFFff means real value stored in ZIP64 record
+>>24	ulelong		!0xFFffFFff	\b, uncompressed size %u
+# inside ./archive 1.151 called "compression method="zipcompression
 >>10	leshort		x		\b, method=
@@ -104,9 +111,15 @@
 # Zip End Of Central Directory record
+# GRR: wrong for ZIP with comment archive
 -22	string		PK\005\006
-#>4	leshort		>1		\b, %d disks
-#>6	leshort		>1		\b, central directory disk %d
-#>8	leshort		>1		\b, %d central directories on this disk
-#>10	leshort		>1		\b, %d central directories
-#>12	lelong		x		\b, %d central directory bytes
+#>4	uleshort	!0xFFff		\b, %u disks
+#>6	uleshort	!0xFFff		\b, central directory disk %u
+#>8	uleshort	!0xFFff		\b, %u central directories on this disk
+#>10	uleshort	!0xFFff		\b, %u central directories
+#>12	ulelong		!0xFFffFFff	\b, %u central directory bytes
+# offset of central directory
+#>16	ulelong		x		\b, central directory offset %#x
 >(16.l)	use		zipcd
+# archive comment length n
+#>>20	uleshort	>0		\b, comment length %u
+# archive comment
 >>20	pstring/l	>0		\b, %s
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.41-zip-time.diff.sig
Type: application/octet-stream
Size: 987 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20211023/d6f45b97/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: unzip-date.txt.gz
Type: application/x-gzip
Size: 1063 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20211023/d6f45b97/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 7z-l.txt.gz
Type: application/x-gzip
Size: 442 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20211023/d6f45b97/attachment-0001.bin>


More information about the File mailing list