[File] [PATCH] of Magdir/archive for Debian package (+extension *.deb *.udeb)
Christos Zoulas
christos at zoulas.com
Sun Mar 3 17:11:56 UTC 2019
On Mar 3, 5:17pm, joerg.jen.der.ek at gmx.net (=?UTF-8?Q?J=c3=b6rg_Jenderek?=) wrote:
-- Subject: [File] [PATCH] of Magdir/archive for Debian package (+extension *
|
| Hello,
|
| some weeks ago i send patch for "current ar archive". These package
| format is also used for Debian package. So i dig deeper. I look on my
| systems by find utility and on the net for hundreds similar packages. I
| also generate test files to verify magic lines. When i run run file
| command version 5.36 on such examples i get output like:
|
| ckermit_302-5.3_armhf.1of4.deb: part of multipart Debian package
| (format 2.1)
| ckermit_302-5.3_armhf.deb: Debian binary package (format 2.0)
| debian-UNKNOWN-x.y.deb: (format x.y)
| e2fsprogs-udeb_1.44.4-2_sparc64.udeb: Debian binary package (format 2.0)
| openjdk-6-source_6b11-9.1_all.deb: Debian binary package (format 2.0)
| xfslibs-dev_3.1.7+b1_mips.deb: Debian binary package (format 2.0)
| xfslibs-test-lzma.deb: Debian binary package (format 2.0)
|
| Inside Magdir/archive the first test line catch all Debian packages by
| 0 string =!<arch>\ndebian
| Afterwards 2 sub class variants are done by 2 additional lines
| >8 string debian-split part of multipart Debian package
| >8 string debian-binary Debian binary package
| But there is missing a last effort branch for rare cases, when another
| Debian package class occur like in simulated example
| debian-UNKNOWN-x.y.deb or a Debian package is damaged at offset 14. In
| that case only a message like "(format x.y)" is shown.
| Instead a dead line appears which is never executed
| >8 string !debian
| So i deleted dead line.
|
| The keyword "debian" at offset 8 is already recognized by first magic
| line. So this phrase can be removed from second level test lines.
| So old describing lines now becomes
| >14 string -split part of multipart Debian package
| >14 string -binary Debian binary package
| Third branch is now shown by new line like
| >14 default x Unknown Debian package
|
| Most information about Debian package file format is mentioned on
| Wikipedia page. So add URL line like
| # URL: https://en.wikipedia.org/wiki/Deb_(file_format)
|
| After mime type line add line for file type extension. In most cases
| this is "deb", but for stripped down Debian package "udeb" like in
| example e2fsprogs-udeb_1.44.4-2_sparc64.udeb is used. This is now
| expressed by additional line like
| !:ext deb/udeb
|
| The Debian format version was expressed by line
| 68 string >\0 (format %s)
| For most Debian cases this is 2.0 or 2.1 for splitted packages.
| Now i also show additional available package information.
| So for standard case this is done by branch starting with
| >68 string =2.0\n
| Most interesting and information available is the used compression
| method. Why? Some times you are using an old distribution where only
| gzip compression is available. Or you are using an embedded system where
| compression tools are offered by busy box. And the needed compression is
| not implemented inside busy box.
| So look for second ar archive name. This is the control archive with
| name like control.tar.gz or control.tar.xz showing the used compression.
| This is now done by line like
| >>72 string >\0 \b, with %.14s
| After the binary compressed content of second archive member comes third
| ar member part. So look for this data archive name with names like
| data.tar.{gz,xz,bz2,lzma}. OK this not a performance burner, but it
| works if FILE_BYTES_MAX in src/file.h is raised by additional lines like
| >>0 search/0x93e4f data.tar. \b, data compression
| >>>&0 string x %.4s
|
| The splitted Debian package with version 2.1 are now handled by branch
| starting with additional line
| >68 string =2.1\n
| According to source file of dpkg-split tool the meta information is now
| shown. Starting with NL terminated ASCII package name like "ckermit" by
| additional line like
| >>&0 string x \b, %s
|
| After applying the above mentioned modifications by patch
| file-5.36-archive-deb.diff then all inspected examples are now
| described like:
|
| ckermit_302-5.3_armhf.1of4.deb: part of multipart Debian package
| (format 2.1), ckermit 302-5.3
| , MD5 28bedba63fc2c33e6c5845d412d1ca8e
| , unsplitted size 1655898, part lenght 510976, part 1/4, armhf
| ckermit_302-5.3_armhf.deb: Debian binary package (format 2.0)
| , with control.tar.gz, data compression xz
| debian-UNKNOWN-x.y.deb: Unknown Debian package
| (format x.y)
| e2fsprogs-udeb_1.44.4-2_sparc64.udeb: Debian binary package (format 2.0)
| , with control.tar.xz, data compression xz
| openjdk-6-source_6b11-9.1_all.deb: Debian binary package (format 2.0)
| , with control.tar.gz, data compression bz2
| xfslibs-dev_3.1.7+b1_mips.deb: Debian binary package (format 2.0)
| , with control.tar.gz, data compression gz
| xfslibs-test-lzma.deb: Debian binary package (format 2.0)
| , with control.tar.gz, data compression lzma
|
| I hope my diff file can be applied in future version of file utility.
|
| Furthermore i look at the File identifying utility TrID ( See
| http://mark0.net/soft-trid-e.html). With verbose option "-v" also the
| information about the related URL is printed. I like that feature.
| Maybe this could implemented inside file command. Similar to "--mime" an
| option like "--url" could show the the URL information for inspected
| samples, when the magic file contain a new magic like
| !:url https://en.wikipedia.org/wiki/Deb_(file_format)
That's great! Committed, thanks...
christos
More information about the File
mailing list