[File] [PATCH] of Magdir/archive for Debian package (+extension *.deb *.udeb)

Christos Zoulas christos at zoulas.com
Sun Mar 3 17:11:56 UTC 2019


On Mar 3,  5:17pm, joerg.jen.der.ek at gmx.net (=?UTF-8?Q?J=c3=b6rg_Jenderek?=) wrote:
-- Subject: [File] [PATCH] of Magdir/archive for Debian package (+extension *
| 
| Hello,
| 
| some weeks ago i send patch for "current ar archive". These package
| format is also used for Debian package. So i dig deeper. I look on my
| systems by find utility and on the net for hundreds similar packages. I
| also generate test files to verify magic lines. When i run run file
| command version 5.36 on such examples i get output like:
| 
| ckermit_302-5.3_armhf.1of4.deb:       part of multipart Debian package
| 	(format 2.1)
| ckermit_302-5.3_armhf.deb:            Debian binary package (format 2.0)
| debian-UNKNOWN-x.y.deb:               (format x.y)
| e2fsprogs-udeb_1.44.4-2_sparc64.udeb: Debian binary package (format 2.0)
| openjdk-6-source_6b11-9.1_all.deb:    Debian binary package (format 2.0)
| xfslibs-dev_3.1.7+b1_mips.deb:        Debian binary package (format 2.0)
| xfslibs-test-lzma.deb:                Debian binary package (format 2.0)
| 
| Inside Magdir/archive the first test line catch all Debian packages by
|  0	string		=!<arch>\ndebian
| Afterwards 2 sub class variants are done by 2 additional lines
|  >8	string		debian-split	part of multipart Debian package
|  >8	string		debian-binary	Debian binary package
| But there is missing a last effort branch for rare cases, when another
| Debian package class occur like in simulated example
| debian-UNKNOWN-x.y.deb or a Debian package is damaged at offset 14. In
| that case only a message like "(format x.y)" is shown.
| Instead a dead line appears which is never executed
|  >8	string		!debian
| So i deleted dead line.
| 
| The keyword "debian" at offset 8 is already recognized by first magic
| line. So this phrase can be removed from second level test lines.
| So old describing lines now becomes
|  >14	string		-split	part of multipart Debian package
|  >14	string		-binary	Debian binary package
| Third branch is now shown by new line like
|  >14	default		x	Unknown Debian package
| 
| Most information about Debian package file format is mentioned on
| Wikipedia page. So add URL line like
|  # URL: https://en.wikipedia.org/wiki/Deb_(file_format)
| 
| After mime type line add line for file type extension. In most cases
| this is "deb", but for stripped down Debian package "udeb" like in
| example e2fsprogs-udeb_1.44.4-2_sparc64.udeb is used. This is now
| expressed by additional line like
|  !:ext	deb/udeb
| 
| The Debian format version was expressed by line
|  68	string		>\0		(format %s)
| For most Debian cases this is 2.0 or 2.1 for splitted packages.
| Now i also show additional available package information.
| So for standard case this is done by branch starting with
|  >68	string		=2.0\n
| Most interesting and information available is the used compression
| method. Why? Some times you are using an old distribution where only
| gzip compression is available. Or you are using an embedded system where
| compression tools are offered by busy box. And the needed compression is
| not implemented inside busy box.
| So look for second ar archive name. This is the control archive with
| name like control.tar.gz or control.tar.xz showing the used compression.
| This is now done by line like
|  >>72	string		>\0		\b, with %.14s
| After the binary compressed content of second archive member comes third
| ar member part. So look for this data archive name with names like
| data.tar.{gz,xz,bz2,lzma}. OK this not a performance burner, but it
| works if FILE_BYTES_MAX in src/file.h is raised by additional lines like
|  >>0	search/0x93e4f	data.tar.	\b, data compression
|  >>>&0	string		x		%.4s
| 
| The splitted Debian package with version 2.1 are now handled by branch
| starting with additional line
|  >68	string		=2.1\n
| According to source file of dpkg-split tool the meta information is now
| shown. Starting with NL terminated ASCII package name like "ckermit" by
| additional line like
|  >>&0	string		x		\b, %s
| 
| After applying the above mentioned modifications by patch
| file-5.36-archive-deb.diff then all inspected examples are now
| described like:
| 
| ckermit_302-5.3_armhf.1of4.deb:       part of multipart Debian package
| 	(format 2.1), ckermit 302-5.3
| 	, MD5 28bedba63fc2c33e6c5845d412d1ca8e
| 	, unsplitted size 1655898, part lenght 510976, part 1/4, armhf
| ckermit_302-5.3_armhf.deb:            Debian binary package (format 2.0)
| 	, with control.tar.gz, data compression xz
| debian-UNKNOWN-x.y.deb:               Unknown Debian package
| 	(format x.y)
| e2fsprogs-udeb_1.44.4-2_sparc64.udeb: Debian binary package (format 2.0)
| 	, with control.tar.xz, data compression xz
| openjdk-6-source_6b11-9.1_all.deb:    Debian binary package (format 2.0)
| 	, with control.tar.gz, data compression bz2
| xfslibs-dev_3.1.7+b1_mips.deb:        Debian binary package (format 2.0)
| 	, with control.tar.gz, data compression gz
| xfslibs-test-lzma.deb:                Debian binary package (format 2.0)
| 	, with control.tar.gz, data compression lzma
| 
| I hope my diff file can be applied in future version of file utility.
| 
| Furthermore i look at the File identifying utility TrID ( See
| http://mark0.net/soft-trid-e.html). With verbose option "-v" also the
| information about the related URL is printed. I like that feature.
| Maybe this could implemented inside file command. Similar to "--mime" an
| option like "--url" could show the the URL information for inspected
| samples, when the magic file contain a new magic like
|  !:url	https://en.wikipedia.org/wiki/Deb_(file_format)

That's great! Committed, thanks...

christos


More information about the File mailing list