[File] [PATCH] of Magdir/archive for xar archive; update+extensions *.xar *.xip *.pkg

Jörg Jenderek joerg.jen.der.ek at gmx.net
Fri Mar 29 22:21:45 UTC 2019


Hello,

some days ago i run file command version 5.36 on eXtensible ARchives.
I get an output like:

dslocal-backup.xar:      xar archive version 1, SHA-1 checksum
FullBundleUpdate.pkg:    xar archive version 1, SHA-1 checksum
none-none-bzip2.xar:     xar archive version 1, no checksum
sha1-sha1-bzip28.xar:    xar archive version 1, SHA-1 checksum
sha1-sha1-gzip5.xar:     xar archive version 1, SHA-1 checksum
sha1-sha1-none.xar:      xar archive version 1, SHA-1 checksum
sha256-sha256-gzip7.xar: xar archive version 1,
sha256-sha512-bzip2.xar: xar archive version 1,
sha512-sha512-gzip.xar:  xar archive version 1,
xar-1.1.xar:             xar archive version 1, SHA-1 checksum
xar-1.5.2.xar:           xar archive version 1, SHA-1 checksum
Xcode_10.2_beta_4.xip:   xar archive version 1, SHA-1 checksum

What is wrong? Apparently newer checksum algorithm are not recognized.
Only older methods like "SHA-1" are recognized. Furthermore with
--extension option only ??? is displayed.

So i started to change Magdir/archive. Most information is found on
new Wikipedia page about xar archiver. So add comment line like
 # URL: https://en.wikipedia.org/wiki/Xar_(archiver)

Furthermore all fields are stored in big endian order. And many are
unsigned. So lines about table of contents (TOC) must look like:
 >8	ubequad	x		compressed TOC: %llu,
 >16	ubequad	x		uncompressed TOC: %llu,

The Header size of archives is stored at position 4. In older versions
this was always 28. So there was no need to show this standard value.
So this was mentioned only as a comment line:
 #>4	beshort	x		header size %d
According to Kyle J. McKay wiki in newer versions also padding bytes
( like in example xar-1.1.xar) or checksum algorithm name can appear
after that position in header. This is probably interesting for users,
because older versions of the xar library do not correctly handle a
header size value of other than 28. So display this information now by
line like
 >4	ubeshort >28		\b, header size %u

Apparently for eXtensible ARchives file name extension "xar" is used.
The xar format is used by some Mac OS X installers for packages. Then
there the filename extension "pkg" like in example
FullBundleUpdate.pkg is used. Apple introduced a variant with
additional signature. Such signed archives like Xcode_10.2_beta_4.xip
have "xip" extension. So these 3 extensions are shown by additional line:
 !:ext	xar/pkg/xip

Version of Xar format is stored at position 6. This value is 1 at the
moment and has not changed in the last years. So do not bother users
with uninteresting information and display now information only when
not standard by line:
 >6	ubeshort >1		version %u,

The variable for used checksum algorithm is stored at position 24.
According to Wikipedia page values 3, 4 is used for newer checksum
methods. This is now expressed by additional lines
 >24	belong	3		SHA-256 checksum
 >24	belong	4		SHA-512 checksum
So examples like sha256-sha512-bzip2.xar and sha512-sha512-gzip.xar
are described more precisely.To recognize also possible other checksum
algorithm add also additional line:
 >24	belong	>4		unknown 0x%x checksum

Some interesting information can be got from bytes inside heap
section. By jumping over header and TOC (table of contents) we get at
the heap section and can inspect data by lines by calling file again like:
 >>>&(4.S)	ubyte	x
 >>>>&(8.Q)	ubyte	x
 >>>>>&-1	indirect x	\b, contains
When we look in the XML TOC we see at the beginning of heap the
checksum is stored. So by jumping more bytes forward depending on
checksum type the pointer looks at data. So for SHA-1 this are
additional 20 bytes. So this looks for SHA-1 (18=20 - 1 (.S
expression) -1 (.Q expression) like
 >24	belong	1
 >>18		ubyte	x
Now by the indirect expression the compression method like bzip2 is
reported, which was used when building archive by xar --compression
option. In older version of xar utility only gzip and bzip2 are
supported, which i have tested. According to man page xar(1) in Apple
MAC OSX new versions like mojave also lzma can be used. According to
open source fork 1.6.1 also xz is supported.

For pkg and xip the indirect expression pointer aims at the the
signature. So if magic lines for something like X509Certificate exist
it should be possible to distinguish such file name extensions from
xar variants. This is a TODO.

After applying the above mentioned modifications by patch
file-5.36-archive-xar.diff then all inspected examples are now
described more precisely like:

dslocal-backup.xar:      xar archive
	compressed TOC: 20986, SHA-1 checksum
	, contains zlib compressed data
FullBundleUpdate.pkg:    xar archive
	compressed TOC: 3188, SHA-1 checksum
none-none-bzip2.xar:     xar archive
	compressed TOC: 385, no checksum
	, contains bzip2 compressed data, block size = 900k
sha1-sha1-bzip28.xar:    xar archive
	compressed TOC: 498, SHA-1 checksum
	, contains bzip2 compressed data, block size = 800k
sha1-sha1-gzip5.xar:     xar archive
	compressed TOC: 499, SHA-1 checksum
	, contains zlib compressed data
sha1-sha1-none.xar:      xar archive
	compressed TOC: 476, SHA-1 checksum
sha256-sha256-gzip7.xar: xar archive
	compressed TOC: 538, SHA-256 checksum
	, contains zlib compressed data
sha256-sha512-bzip2.xar: xar archive
	compressed TOC: 620, SHA-256 checksum
	, contains bzip2 compressed data, block size = 900k
sha512-sha512-gzip.xar:  xar archive
	compressed TOC: 614, SHA-512 checksum
	, contains zlib compressed data
xar-1.1.xar:             xar archive, header size 32
	compressed TOC: 5726, SHA-1 checksum
	, contains zlib compressed data
xar-1.5.2.xar:           xar archive
	compressed TOC: 6932, SHA-1 checksum
	, contains zlib compressed data
Xcode_10.2_beta_4.xip:   xar archive
	compressed TOC: 2948, SHA-1 checksum

I hope my diff file can be applied in future version of file utility.

With best wishes
Jörg Jenderek
--
Jörg Jenderek




















-------------- next part --------------
--- file-5.36/magic/Magdir/archive.old	2019-02-20 15:07:44 +0000
+++ file-5.36/magic/Magdir/archive	2019-03-29 17:09:49 +0000
@@ -1369,10 +1369,21 @@
 # xar (eXtensible ARchiver) archive
+# URL: https://en.wikipedia.org/wiki/Xar_(archiver)
 # xar archive format: http://code.google.com/p/xar/
 # From: "David Remahl" <dremahl at apple.com>
+# Update: Joerg Jenderek
+# TODO: lzma compression; X509Data for pkg and xip
+# Note: verified by `xar --dump-header -f FullBundleUpdate.xar` or
+# 7z t -txar Xcode_10.2_beta_4.xip`
 0	string	xar!		xar archive
 !:mime	application/x-xar
-#>4	beshort	x		header size %d
->6	beshort	x		version %d,
-#>8	quad	x		compressed TOC: %d,
-#>16	quad	x		uncompressed TOC: %d,
+# pkg for Mac OSX installer package like FullBundleUpdate.pkg
+# xip for signed Apple software like Xcode_10.2_beta_4.xip
+!:ext	xar/pkg/xip
+# always 28 in older archives
+>4	ubeshort >28		\b, header size %u
+# currently there exit only version 1 since about 2014
+>6	ubeshort >1		version %u,
+>8	ubequad	x		compressed TOC: %llu,
+#>16	ubequad	x		uncompressed TOC: %llu,
+# cksum_alg 0-2 in older and also 3-4 in newer
 >24	belong	0		no checksum
@@ -1380,2 +1391,41 @@
 >24	belong	2		MD5 checksum
+>24	belong	3		SHA-256 checksum
+>24	belong	4		SHA-512 checksum
+>24	belong	>4		unknown 0x%x checksum
+#>24	belong	>4		checksum
+#			For no compression jump 0 bytes
+>24	belong	0
+>>0		ubyte	x
+# jump more bytes forward by header size
+>>>&(4.S)	ubyte	x
+# jump more bytes forward by compressed table of contents size
+#>>>>&(8.Q)	ubequad	x	\b, heap data 0x%llx
+>>>>&(8.Q)	ubyte	x
+# look for data by ./compress after message with 1 space at end
+>>>>>&-3	indirect x	\b, contains 
+#			For SHA-1 jump 20 minus 2 bytes
+>24	belong	1
+>>18		ubyte	x
+# jump more bytes forward by header size
+>>>&(4.S)	ubyte	x
+# jump more bytes forward by compressed table of contents size
+>>>>&(8.Q)	ubyte	x
+# data compressed by gzip, bzip, lzma or none
+>>>>>&-1	indirect x	\b, contains 
+#			For SHA-256 jump 32 minus 2 bytes
+>24	belong	3
+>>30		ubyte	x
+# jump more bytes forward by header size
+>>>&(4.S)	ubyte	x
+# jump more bytes forward by compressed table of contents size
+>>>>&(8.Q)	ubyte	x
+>>>>>&-1	indirect x	\b, contains 
+#			For SHA-512 jump 64 minus 2 bytes
+>24	belong	4
+>>62		ubyte	x
+# jump more bytes forward by header size
+>>>&(4.S)	ubyte	x
+# jump more bytes forward by compressed table of contents size
+>>>>&(8.Q)	ubyte	x
+>>>>>&-1	indirect x	\b, contains 
 


More information about the File mailing list