[File] [PATCH] Magdir/diff, console Courgette binary diff misidentified as Nintendo Gameboy Music/Audio

Jörg Jenderek (GMX) joerg.jen.der.ek at gmx.net
Wed Feb 7 17:30:05 UTC 2024


Hello,

some days ago i must handle some patch files. Often the file name suffix
PATCH is used. For control reason i look for samples with that suffix on
my systems. In context of "chromium" web browsers like Opera i found
samples which are currently not recognized. This variant called
"Courgette" is used to store updates for such web browsers.
Unfortunately also BSDIFF suffix is used for such update patches. This
suffix is also used by BSD binary patching tool from Colin Percival. The
Google developer used in the past that format but switched to Courgette
tool. It took me two days to understand that Courgette file format is
total different from bsdiff format described by Magdir/diff. So i looked
after header for signatures of known compression methods like (BZ0 for
BZIP and son on) but i found nothing because it is an own algorithm.
Also the header format is different which is not visible at first glance
because both formats contains length information in header.
So i mention these facts in comment line.

So i run command version 5.45 on such patch examples and "related"
nintendo audio files with GBS suffix. The patch samples are "recognized"
and described wrong. I get an output like:

bugs bunny crazy castle 3.gbs: Nintendo Gameboy Music/Audio Data (
                                  "Bugs Bunny Crazy Castle 3" by
			       ?, copyright
			       1999 Kemco
			       , version 1, 19 tracks
chrome64-1-2.v1.bsdiff:        Nintendo Gameboy Music/Audio Data (
                                  "\200\3402\352\232\312\011" by
			       \035\231\005\037\011, copyright
			       \024)\022\237\003\241\003)
			       , version 68, 73 tracks
hero hero-kun.gbs:             Nintendo Gameboy Music/Audio Data (
                                  "Hero Hero Kun" by
			       ?, copyright
			       2001 Imagineer/KT.Kodansha/P&B)
			       , version 1, 108 tracks
jurazzic park.gbs:             Nintendo Gameboy Music/Audio Data (
                                  "Jurazzic Park" by
			       1993 Ocean)
			       , version 1, 10 tracks
nightmode.gbs:                 Nintendo Gameboy Music/Audio Data (
			       "Nightmode" by
			       Laxity, copyright
			       version 1, 1 tracks
opera_browser.dll.sig.patch:   Nintendo Gameboy Music/Audio Data (
                                  "\013\352\232\312\011" by
			       \312\206sVkp\275U\017E9q\026\311
			       , copyright
			       \030f\033T'\371H\3209\253\256\32)
			       , version 68, 73 tracks
the blues brothers.gbs:        Nintendo Gameboy Music/Audio Data (
                                  "Blues Brothers" by
			       <?>, copyright
			       1991 Titus)
			       , version 1, 6 tracks

With option --extension only 3 byte sequence ??? is shown and with -i
option application/octet-stream is shown.

For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/).
The samples are not recognized.

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). This identifies all GBS
examples with high priority as "GameBoy Sound System dump" with mime
type application/octet-stream by gbs.trid.xml. The patch samples are
described as "Courgette Binary Diff output" with mime type
application/octet-stream by bsdiff-chrome.trid.xml (See appended
trid-v-bsdiff.txt.gz).

This tool list the used file name extension and with -v option the
related URL pointing to used web site with file format information.
That informations are expressed by comment lines inside Magdir/console like:
# URL:		http://fileformats.archiveteam.org/wiki/Game_Boy_Sound
#		http://en.wikipedia.org/wiki/Game_Boy_Sound_System
# Reference:	http://mark0.net/download/triddefs_xml.7z
#		defs/g/gbs.trid.xml
On the mentioned site you also find some samples to download.
According to "Gameboy Sound System (.gbs).txt" the GBS samples are
recognized by lines inside Magdir/console. These look like:
  0	string		GBS	Nintendo Gameboy Music/Audio Data
  >16	string		>\0	("%.32s" by
  >48	string		>\0	%.32s, copyright
  >80	string		>\0	%.32s),
  >3	byte		x	version %u,
  >4	byte		x	%u tracks

Apparently 3 byte GBS string at the beginning is not unique enough. This
also matches GBSDIF42 magic of Courgette bsdiff. It also matches 4 bytes
GBST magic of Grand Theft Auto 2 Style data (*.sty via TrID
sty-gta2.trid.xml). The 32 byte string fields are right null-filled, but
there is no terminating \0 if all bytes are used. If field is unknown,
it should be set to a single question mark. So for real GBS samples i
get ASCII like for title string (like: "Blues Brothers" "Bugs Bunny
Crazy Castle 3"), author string ( like <?> 1993 Ocean) and copyright
string (like "1991 Titus" "2001 Imagineer/KT.Kodansha/P&B" "2000
Newline, Ubisoft, D. Eclip.") whereas for the patch samples i get octal
garbage here.

To distinguish from other formats i check for other remaining fields at
the beginning. This is done by additional lines like:
  >5	ubyte		!1	\b, %u first
  >6	uleshort	x	\b, load address %#4.4x
  >8	uleshort	x	\b, init address %#4.4x
  >10	uleshort	x	\b, play address %#4.4x
  >12	uleshort	x	\b, stack address %#4.4x
  >14	ubyte		!0	\b, timer modulo %#x
  >15	ubyte		!0	\b, timer control %#x
  #>0x70	ubequad		x	\b, data %#16.16llx...

This information can be verified by running command line tool from
Gameboy sound player package gbsplay like:
	LANG=C gbsinfo /usr/share/doc/gbsplay/examples/nightmode.gbs`

The load/init/play addresses have a lower bound of 400h and upper bound
of 7fffh. So for patch samples i get invalid "high" values here, but i
do not know if this always occur or is triggered by lucky circumstances.
So i could not use it as reliable additional test.
The version value is 1 according to documentation. Since Nintendo
Gameboy is an old console, so an evolution with higher version numbers
is unrealistic. So i check for "low" valid version number (as it is done
by TrID) and so skip misidentified other non audio samples. So the
starting lines now looks like:
  0	string		GBS\001	Nintendo Gameboy Music/Audio Data
  !:mime		audio/x-nintendo-gbs
  !:ext	gbs

By xgbsplay program audio/prs.gbs mime type is used and by terminal
program gbsplay mime type audio/gbs is used. But these type are not
officially registered at IANA. So i choose an user defined type
audio/x-nintendo-gbs.

Luckily i found page about Courgette on chromium web server. That
informations are now expressed by comment lines inside Magdir/diff like:
# URL:		https://www.chromium.org/developers/design-documents#
# Reference:	https://github.com/adobe/chromium/blob/master/courgette/
#		third_party/bsdiff.h
#		http://mark0.net/download/triddefs_xml.7z
#		defs/b/bsdiff-chrome.trid.xml

Following these mentioned sources i found more samples in testdata and
the starting bytes are understandable when looking in thirdparty header
file bsdiff.h.

According to header file the web browser patched samples are recognized
by lines after bsdiff(1) patch file section inside Magdir/diff. These
look like:
  0	string/b	GBSDIF42	Courgette binary diff output
  !:mime	application/x-patch
  !:ext	patch/bsdiff
  #>8	ubelong		x		\b, source length %u
  >12	ubelong		x		\b, crc %#8.8x
  #>16	ubelong		x		\b, result length %u
  #>20	ubelong		x		\b, control length %u
  #>24	ubelong		x		\b, patch length %u
  #>28	ubelong		x		\b, extra length %u

The samples are not text files. So the mime type like text/plain or
text/x-patch is wrong for such samples. The application/x-bsdiff used
for binary BSD bsdiff is also not correct because file format is totally
different. So these samples could get generic application/octet-stream
type used for binary files. Instead i choose an user defined own type.
For my samples i get "high" values (GB) for length values which i do
not understand. So i do not display these values.

After applying the above mentioned modifications by patches
file-diff-patch.diff file-5.45-console-gbs.diff then all my inspected
examples are now described correctly without misidentifications. This
now looks like:

bugs bunny crazy castle 3.gbs: Nintendo Gameboy Music/Audio Data (
      	   	 	       "Bugs Bunny Crazy Castle 3" by
			       ?, copyright
			       1999 Kemco)
			       , 19 tracks
			       , 2 first
			       , load address 0x3fc0
			       , init address 0x3fc4
			       , play address 0x6224
			       , stack pointer 0xdfff
chrome64-1-2.v1.bsdiff:        Courgette binary diff output
			       , crc 0xfda0a406
hero hero-kun.gbs:             Nintendo Gameboy Music/Audio Data (
      			       "Hero Hero Kun" by
			       ?, copyright
			       2001 Imagineer/KT.Kodansha/P&B)
			       , 108 tracks
			       , 79 first
			       , load address 0x17f0
			       , init address 0x17f0
			       , play address 0x19a2
			       , stack pointer 0xcfff
jurazzic park.gbs:             Nintendo Gameboy Music/Audio Data (
	 		       "Jurazzic Park" by
			       1993 Ocean)
			       , 10 tracks
			       , 7 first
			       , load address 0x4000
			       , init address 0x4009
			       , play address 0x400f
			       , stack pointer 0xfff4
nightmode.gbs:                 Nintendo Gameboy Music/Audio Data (
			       "Nightmode" by
			       Laxity, copyright
			       )
			       , 1 track
			       , load address 0x3000
			       , init address 0x3800
			       , play address 0x3290
			       , stack pointer 0xfff4
opera_browser.dll.sig.patch:   Courgette binary diff output
			       , crc 0xb4a709bf
the blues brothers.gbs:        Nintendo Gameboy Music/Audio Data (
     	  		       "Blues Brothers" by
			       <?>, copyright
			       1991 Titus)
			       , 6 tracks
			       , load address 0x3ff0
			       , init address 0x3ff0
			       , play address 0x4079
			       , stack pointer 0xc100



I hope my diff files can be applied in future version of file
utility.

With best wishes,
Jörg Jenderek
--
Jörg Jenderek
-------------- next part --------------
--- file-5.45/magic/Magdir/console.old	2023-06-19 15:43:13.000000000 +0200
+++ file-5.45/magic/Magdir/console	2024-02-04 00:44:48.456403200 +0100
@@ -725,13 +725,50 @@
 # From: David Pflug <david at pflug.email>
+# Update:	Joerg Jenderek
+# URL:		http://fileformats.archiveteam.org/wiki/Game_Boy_Sound
+#		http://en.wikipedia.org/wiki/Game_Boy_Sound_System
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/g/gbs.trid.xml
+# Note:		called "GameBoy Sound System dump" by TrID,
+#		"Gameboy GBS rom image" by X11 Gameboy sound player xgbsplay and
+#		verified by gbsplay `LANG=C gbsinfo /usr/share/doc/gbsplay/examples/nightmode.gbs`
 # is the offset 12 or the offset 16 correct?
 # GBS (Game Boy Sound) magic
-# ftp://ftp.modland.com/pub/documents/format_documentation/\
+# http://ftp.modland.com/pub/documents/format_documentation/\
 # Gameboy%20Sound%20System%20(.gbs).txt
-0	string		GBS		Nintendo Gameboy Music/Audio Data
+# skip Grand Theft Auto 2 Style data (*.sty via sty-gta2.trid.xml) and Opera (*.patch) by check for valid "low" version
+0	string		GBS\001		Nintendo Gameboy Music/Audio Data
+!:mime		audio/x-nintendo-gbs
+# by gbsplay or xgbsplay tools
+#!:mime		audio/gbs
+#!:mime		audio/prs.gbs
+!:ext	gbs
 #12	string		GameBoy\ Music\ Module	Nintendo Gameboy Music Module
+# fields are right null-filled; no terminating \0 if all bytes are used; if field unknown, should be set to a single ?
+# title string like: "Blues Brothers" "Bugs Bunny Crazy Castle 3"
 >16	string		>\0	("%.32s" by
+# author string like: <?>, by Laxity, Justin Muir, 1993 Ocean
 >48	string		>\0	%.32s, copyright
->80	string		>\0	%.32s),
->3	byte		x	version %u,
->4	byte		x	%u tracks
+# copyright string like: empty "1991 Titus" "2001 Imagineer/KT.Kodansha/P&B" "2000 Newline, Ubisoft, D. Eclip."
+>80	string		>\0	%.32s
+# GBSVersion; 1 
+#>3	byte		!1	version %u,
+# number of songs (1-255)
+>4	ubyte		x	\b), %u track
+# plural s
+>4	ubyte		>1	\bs
+# default subsong; like: 1 (often) 2 29 60 79 82
+>5	ubyte		!1	\b, %u first
+# load address (400h-7fffh)
+>6	uleshort	x	\b, load address %#4.4x
+# init address (400h-7fffh)
+>8	uleshort	x	\b, init address %#4.4x
+# play address (400-7fffh)
+>10	uleshort	x	\b, play address %#4.4x
+# stack pointer; like: FFFEh (default) CFFFh DCFEh DDFEh DDFFh DEFFh E000h FFF4h
+>12	uleshort	x	\b, stack pointer %#4.4x
+# timer modulo; often 0
+>14	ubyte		!0	\b, timer modulo %#x
+# timer control; often 0
+>15	ubyte		!0	\b, timer control %#x
+# code and Data (see RST VECTORS)
+#>0x70	ubequad		x	\b, data %#16.16llx...
 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.45-console-gbs.diff.sig
Type: application/octet-stream
Size: 1477 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20240207/5335bfa7/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-bsdiff.txt.gz
Type: application/x-gzip
Size: 840 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20240207/5335bfa7/attachment-0001.bin>
-------------- next part --------------
--- file-master/magic/Magdir/diff.old	2024-02-04 02:00:00.384857000 +0100
+++ file-master/magic/Magdir/diff	2024-02-07 18:09:45.230541600 +0100
@@ -52,6 +52,29 @@
 # look for bzip data by ./compress after message with 1 space at end
 >>0x20	indirect	x		\b, at 0x20 
 
+# From:		Joerg Jenderek
+# URL:		https://www.chromium.org/developers/design-documents/software-updates-courgette/
+# Reference:	https://github.com/adobe/chromium/blob/master/courgette/third_party/bsdiff.h
+#		http://mark0.net/download/triddefs_xml.7z/defs/b/bsdiff-chrome.trid.xml
+# Note:		called "Courgette Binary Diff output" by TrID
+#		the Courgette bsdiff tool use a total different file format compared with BSD variant from Colin Percival
+0	string/b	GBSDIF42	Courgette binary diff output
+#!:mime	application/octet-stream
+!:mime	application/x-patch
+!:ext	patch/bsdiff
+# slen; length of the file to be patched
+#>8	ubelong		x		\b, source length %u
+# scrc32; CRC32 of the file to be patched
+>12	ubelong		x		\b, crc %#8.8x
+# dlen; length of the result file
+#>16	ubelong		x		\b, result length %u
+# cblen; length of the control block
+#>20	ubelong		x		\b, control length %u
+# difflen; length of the diff block
+#>24	ubelong		x		\b, patch length %u
+# extralen; length of the extra block
+#>28	ubelong		x		\b, extra length %u
+
 # unified diff
 # URL:		http://fileformats.archiveteam.org/wiki/Unified_diff
 #		https://en.wikipedia.org/wiki/Diff_utility#Unified_format
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-diff-patch.diff.sig
Type: application/octet-stream
Size: 913 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20240207/5335bfa7/attachment-0003.obj>


More information about the File mailing list