[File] [PATCH] of Magdir/archive for Debian package (+extension *.deb *.udeb)

Jörg Jenderek joerg.jen.der.ek at gmx.net
Sun Mar 3 16:17:00 UTC 2019


Hello,

some weeks ago i send patch for "current ar archive". These package
format is also used for Debian package. So i dig deeper. I look on my
systems by find utility and on the net for hundreds similar packages. I
also generate test files to verify magic lines. When i run run file
command version 5.36 on such examples i get output like:

ckermit_302-5.3_armhf.1of4.deb:       part of multipart Debian package
	(format 2.1)
ckermit_302-5.3_armhf.deb:            Debian binary package (format 2.0)
debian-UNKNOWN-x.y.deb:               (format x.y)
e2fsprogs-udeb_1.44.4-2_sparc64.udeb: Debian binary package (format 2.0)
openjdk-6-source_6b11-9.1_all.deb:    Debian binary package (format 2.0)
xfslibs-dev_3.1.7+b1_mips.deb:        Debian binary package (format 2.0)
xfslibs-test-lzma.deb:                Debian binary package (format 2.0)

Inside Magdir/archive the first test line catch all Debian packages by
 0	string		=!<arch>\ndebian
Afterwards 2 sub class variants are done by 2 additional lines
 >8	string		debian-split	part of multipart Debian package
 >8	string		debian-binary	Debian binary package
But there is missing a last effort branch for rare cases, when another
Debian package class occur like in simulated example
debian-UNKNOWN-x.y.deb or a Debian package is damaged at offset 14. In
that case only a message like "(format x.y)" is shown.
Instead a dead line appears which is never executed
 >8	string		!debian
So i deleted dead line.

The keyword "debian" at offset 8 is already recognized by first magic
line. So this phrase can be removed from second level test lines.
So old describing lines now becomes
 >14	string		-split	part of multipart Debian package
 >14	string		-binary	Debian binary package
Third branch is now shown by new line like
 >14	default		x	Unknown Debian package

Most information about Debian package file format is mentioned on
Wikipedia page. So add URL line like
 # URL: https://en.wikipedia.org/wiki/Deb_(file_format)

After mime type line add line for file type extension. In most cases
this is "deb", but for stripped down Debian package "udeb" like in
example e2fsprogs-udeb_1.44.4-2_sparc64.udeb is used. This is now
expressed by additional line like
 !:ext	deb/udeb

The Debian format version was expressed by line
 68	string		>\0		(format %s)
For most Debian cases this is 2.0 or 2.1 for splitted packages.
Now i also show additional available package information.
So for standard case this is done by branch starting with
 >68	string		=2.0\n
Most interesting and information available is the used compression
method. Why? Some times you are using an old distribution where only
gzip compression is available. Or you are using an embedded system where
compression tools are offered by busy box. And the needed compression is
not implemented inside busy box.
So look for second ar archive name. This is the control archive with
name like control.tar.gz or control.tar.xz showing the used compression.
This is now done by line like
 >>72	string		>\0		\b, with %.14s
After the binary compressed content of second archive member comes third
ar member part. So look for this data archive name with names like
data.tar.{gz,xz,bz2,lzma}. OK this not a performance burner, but it
works if FILE_BYTES_MAX in src/file.h is raised by additional lines like
 >>0	search/0x93e4f	data.tar.	\b, data compression
 >>>&0	string		x		%.4s

The splitted Debian package with version 2.1 are now handled by branch
starting with additional line
 >68	string		=2.1\n
According to source file of dpkg-split tool the meta information is now
shown. Starting with NL terminated ASCII package name like "ckermit" by
additional line like
 >>&0	string		x		\b, %s

After applying the above mentioned modifications by patch
file-5.36-archive-deb.diff then all inspected examples are now
described like:

ckermit_302-5.3_armhf.1of4.deb:       part of multipart Debian package
	(format 2.1), ckermit 302-5.3
	, MD5 28bedba63fc2c33e6c5845d412d1ca8e
	, unsplitted size 1655898, part lenght 510976, part 1/4, armhf
ckermit_302-5.3_armhf.deb:            Debian binary package (format 2.0)
	, with control.tar.gz, data compression xz
debian-UNKNOWN-x.y.deb:               Unknown Debian package
	(format x.y)
e2fsprogs-udeb_1.44.4-2_sparc64.udeb: Debian binary package (format 2.0)
	, with control.tar.xz, data compression xz
openjdk-6-source_6b11-9.1_all.deb:    Debian binary package (format 2.0)
	, with control.tar.gz, data compression bz2
xfslibs-dev_3.1.7+b1_mips.deb:        Debian binary package (format 2.0)
	, with control.tar.gz, data compression gz
xfslibs-test-lzma.deb:                Debian binary package (format 2.0)
	, with control.tar.gz, data compression lzma

I hope my diff file can be applied in future version of file utility.

Furthermore i look at the File identifying utility TrID ( See
http://mark0.net/soft-trid-e.html). With verbose option "-v" also the
information about the related URL is printed. I like that feature.
Maybe this could implemented inside file command. Similar to "--mime" an
option like "--url" could show the the URL information for inspected
samples, when the magic file contain a new magic like
 !:url	https://en.wikipedia.org/wiki/Deb_(file_format)

Thanks
Jörg Jenderek
-- 
Jörg Jenderek

















-------------- next part --------------
--- file-5.36/magic/Magdir/archive.old	2019-02-20 15:07:44 +0000
+++ file-5.36/magic/Magdir/archive	2019-03-03 01:48:05 +0000
@@ -224,25 +224,52 @@
 # Debian package; it's in the portable archive format, and needs to go
 # before the entry for regular portable archives, as it's recognized as
 # a portable archive whose first member has a name beginning with
 # "debian".
 #
+# Update: Joerg Jenderek
+# URL: https://en.wikipedia.org/wiki/Deb_(file_format)
 0	string		=!<arch>\ndebian
->8	string		debian-split	part of multipart Debian package
+# https://manpages.debian.org/testing/dpkg/dpkg-split.1.en.html
+>14	string		-split	part of multipart Debian package
 !:mime	application/vnd.debian.binary-package
->8	string		debian-binary	Debian binary package
+# udeb is used for stripped down deb file
+!:ext	deb/udeb
+>14	string		-binary	Debian binary package
 !:mime	application/vnd.debian.binary-package
->8	string		!debian
+!:ext	deb/udeb
+# This should not happen
+>14	default		x	Unknown Debian package
+# NL terminated version; for most Debian cases this is 2.0 or 2.1 for splitted
 >68	string		>\0		(format %s)
-# These next two lines do not work, because a bzip2 Debian archive
-# still uses gzip for the control.tar (first in the archive).  Only
-# data.tar varies, and the location of its filename varies too.
-# file/libmagic does not current have support for ascii-string based
-# (offsets) as of 2005-09-15.
-#>81	string		bz2		\b, uses bzip2 compression
-#>84	string		gz		\b, uses gzip compression
-#>136	ledate		x		created: %s
+#>68	string		!2.0\n
+#>>68	string		x		(format %.3s)
+>68	string		=2.0\n
+# 2nd archive name=control archive name like control.tar.gz or control.tar.xz
+>>72	string		>\0		\b, with %.14s
+# look for 3rd archive name=data archive name like data.tar.{gz,xz,bz2,lzma}
+>>0	search/0x93e4f	data.tar.	\b, data compression
+# the above line only works if FILE_BYTES_MAX in ../../src/file.h is raised
+# for example like libreoffice-dev-doc_1%3a5.2.7-1+rpi1+deb9u3_all.deb
+>>>&0	string		x		%.4s
+# splitted debian package case
+>68	string		=2.1\n
+# dpkg-1.18.25/dpkg-split/info.c
+# NL terminated ASCII package name like ckermit
+>>&0	string		x		\b, %s
+# NL terminated package version like 302-5.3
+>>>&1	string		x		%s
+# NL terminated MD5 checksum
+>>>>&1	string		x		\b, MD5 %s
+# NL terminated original package length
+>>>>>&1	string		x		\b, unsplitted size %s
+# NL terminated part length
+>>>>>>&1	string	x		\b, part lenght %s
+# NL terminated package part like n/m
+>>>>>>>&1	string	x		\b, part %s
+# NL terminated package architecture like armhf since dpkg 1.16.1 or later
+>>>>>>>>&1	string	x		\b, %s
 
 #
 # MIPS archive; they're in the portable archive format, and need to go
 # before the entry for regular portable archives, as it's recognized as
 # a portable archive whose first member has a name beginning with


More information about the File mailing list