[File] [PATCH] Magdir/diff bsdiff(1) patch file; missing mime type+extension

Jörg Jenderek (GMX) joerg.jen.der.ek at gmx.net
Sat Jan 20 21:11:29 UTC 2024


Hello,

some days ago i must handle some patch files. Unfortunately there exist
about a dozen of different variants. Some are not recognized. In this
session i will handle "bsdiff" samples which are "binary" and not text.
The samples are created by bsdiff utility.

When running file command version 5.45 on such samples i get an
output like:

fmt-439-signature-id-672.bsdiff: bsdiff(1) patch file
lmhosts.bsdiff:                  bsdiff(1) patch file
lsmod-xbox.bsdiff:               bsdiff(1) patch file
test.bsdiff:                     bsdiff(1) patch file

Furthermore only generic mime type application/octet-stream is
shown with -i option. With option --extension only 3 byte
sequence ??? is shown.

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). This does also recognize
samples. These are here with highest priority described as "bsdiff patch
(v4)" with mime type application/x-bsdiff and BSDIFF suffix by
bsdiff.trid.xml (See appended trid-v-bsdiff.txt.gz).

For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/).
The recognized samples here described as "BSDIFF" with version 4.0 by
PUID fmt/439. No mime type is listed here (see appended
droid-bsdiff.csv.gz).

On Linux according to shared MIME-info database such samples are called
"Binary differences between files". Here application/x-bsdiff is used as
mime type. This makes sense because the samples are binary file and not
text files like in many other difference output.  The samples are just
recognized by looking for 8 byte sequence "BSDIFF40" at the beginning.
but also "BSDIFN40" is here considered as valid start magic, but i do
not or can create such samples. That information can be seen in source
freedesktop.org.xml.in found for example on gitlab.freedesktop.org.

With the help of these tools i found pages about BSDIFF file format.
That is expressed inside Magdir/diff by comment lines like:
# URL: 	http://www.daemonology.net/bsdiff/
# Ref.:	https://github.com/cperciva/bsdiff/blob/master/bsdiff-ra/FORMAT
#	http://mark0.net/download/triddefs_xml.7z/defs/b/bsdiff.trid.xml

The detected samples are done by line inside Magdir/diff which looks like:
0	string/b	BSDIFF40	bsdiff(1) patch file

First i look what others tools check. These also check for BZh string at
offset 32 and at offset 36 for string 1AY&SY (that is hexadecimal
sequence 314159265359 for circle number pi in BCD notation). So these
patterns are characteristics for bzip2 compressed data, what is
described by Magdir/compress. After 32 byte header comes compressed
data, which are done at the moment by bzip2. So show information about
that part by adding line like:
 >>0x20	indirect	x		\b, at 0x20

According to documentation in 32 byte sized header the length of
different patch parts are stored. These can be shown by lines like:
 >>8	lequad		x		\b, new length %lld
 >>16	lelong		x		\b, new segment length %d
 >>20	lelong		!0		\b, compressed header length %d
 >>24	lequad		x		\b, data length %lld

The sample fmt-439-signature-id-672.bsdiff is not a real patch. It is
used by DROID tool as pattern template to recognize bsdiff patches.
For this sample all length fields are zero. So i skip this sample with
invalid new file segment length. So the magic start like:
0	string/b	BSDIFF40
 >16	long		!0		bsdiff(1) patch file
!:mime	application/x-bsdiff
!:ext	bsdiff

After applying the above mentioned modifications by patch
file-5.45-diff-bsdiff.diff and using Magdir/compress then
my samples are still recognized, but more details are shown
and invalid DROID sample is skipped. This now then looks like:

fmt-439-signature-id-672.bsdiff: data
lmhosts.bsdiff:                  bsdiff(1) patch file
				 , at 0x20 bzip2 compressed data
				 , block size = 900k
lsmod-xbox.bsdiff:               bsdiff(1) patch file
				 , at 0x20 bzip2 compressed data
				 , block size = 900k
test.bsdiff:                     bsdiff(1) patch file
				 , at 0x20 bzip2 compressed data
				 , block size = 900k

I hope my diff file can be applied in future version of file
utility.

There are still other patch formats, which are sometimes are not
recognized or not described completely. I will try to handle these in a
future session.

With best wishes,
Jörg Jenderek
--
Jörg Jenderek
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-bsdiff.txt.gz
Type: application/x-gzip
Size: 512 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20240120/35a0f844/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: droid-bsdiff.csv.gz
Type: application/x-gzip
Size: 353 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20240120/35a0f844/attachment-0001.bin>
-------------- next part --------------
--- file-5.45/magic/Magdir/diff.old	2021-02-23 01:49:24.000000000 +0100
+++ file-5.45/magic/Magdir/diff	2024-01-20 21:31:48.305902400 +0100
@@ -18,6 +18,26 @@
 
 # bsdiff:  file(1) magic for bsdiff(1) output
-0	string/b	BSDIFF40	bsdiff(1) patch file
-
+# Update:	Joerg Jenderek
+# URL: 		http://www.daemonology.net/bsdiff/
+# Reference:	https://github.com/cperciva/bsdiff/blob/master/bsdiff-ra/FORMAT
+#		http://mark0.net/download/triddefs_xml.7z/defs/b/bsdiff.trid.xml
+# Note:		called "bsdiff patch" by TrID and and "BSDIFF" version 4.0 by DROID via PUID fmt/439 and
+#		"Binary differences between files" by shared MIME-info database from freedesktop.org
+0	string/b	BSDIFF40
+# skip DROID fmt-439-signature-id-672.bsdiff with invalid new file segment length 0
+>16	long		!0		bsdiff(1) patch file
+#!:mime	application/octet-stream
+!:mime	application/x-bsdiff
+!:ext	bsdiff
+# new file length
+#>>8	lequad		x		\b, new length %lld
+# new file segment length
+#>>16	lelong		x		\b, new segment length %d
+# compressed header block length
+#>>20	lelong		!0		\b, compressed header length %d
+# patch data block length
+#>>24	lequad		x		\b, data length %lld
+# look for bzip data by ./compress after message with 1 space at end
+>>0x20	indirect	x		\b, at 0x20 
 
 # unified diff
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.45-diff-bsdiff.diff.sig
Type: application/octet-stream
Size: 861 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20240120/35a0f844/attachment.obj>


More information about the File mailing list