[File] [PATCH] Magdir/audio Adaptive Multi-Rate Codec; missing variant WideBand *.awb

Jörg Jenderek (GMX) joerg.jen.der.ek at gmx.net
Sat Oct 21 00:13:56 UTC 2023


Hello,

some days ago i want to handle some shebang scripts. Surprisingly some
files are audio sample and are not not such scripts. In this session i
will handle Adaptive Multi-Rate Codec samples (*.amr *.awb).

When running file command version 5.45 with -k option on such audio
samples i get an output like:

AUD001.amr:   Adaptive Multi-Rate Codec (GSM telephony)
	      a AMR script executable (binary data)
amr-wb.awb:   Adaptive Multi-Rate Codec (GSM telephony)
	      a AMR-WB script executable (binary data)
example.3ga:  Adaptive Multi-Rate Codec (GSM telephony)
	      a AMR script executable (binary data)
example.amr:  Adaptive Multi-Rate Codec (GSM telephony)
	      a AMR script executable (binary data)

With option -i for all samples audio/amr is shown. And with --extension
option for all samples amr is displayed.

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). The samples with awb
suffix are called "Adaptive Multi-Rate Wideband ACELP codec" without
mime type by audio-awb.trid.xml. The other samples (*.amr) are called
"AMR (Adaptive Multi Rate) encoded audio" with mime type audio/amr by
audio-amr.trid.xml. Here only suffix amr is listed (See appended
trid-v-amr.txt.gz).

For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/).
Here the samples with awb suffix are described as "Adaptive Multi-Rate
Wideband Audio" with mime type audio/amr-wb by PUID fmt/954. The other
samples are described as ""Adaptive Multi-Rate Audio" with mime type
audio/amr by PUID fmt/356. For the sample with 3ga suffix this is
considered as invalid name (See in droid-amr.csv.gz EXTENSION_MISMATCH
true).

According to shared-mime-info database (See freedesktop.org.xml.in for
example on freedesktop.org) the sample with awb suffix is called
"AMR-WB", "AMR-WB audio" or "Adaptive Multi-Rate Wideband" with mime
type audio/AMR-WB. As alternative mime type audio/amr-wb-encrypted is
listed. The other samples are called "AMR", "AMR audio" or "Adaptive
Multi-Rate" with mime type audio/AMR. Only suffix amr is here listed.

TrID list the used file name extension and often with -v option the
related URL pointing to used file format information.

With the help of these tools i found page on file formats archive team
web site. There also links to samples for download are listed. So these
informations are now expressed inside Magdir/audio by additional comment
lines like:
# http://fileformats.archiveteam.org/wiki/Adaptive_Multi-Rate_Audio
# Reference:	https://datatracker.ietf.org/doc/html/rfc4867
#		http://mark0.net/download/triddefs_xml.7z
#		defs/a/audio-amr.trid.xml
#		defs/a/audio-awb.trid.xml

The description happen inside Magdir/audio by lines like:
0	string	#!AMR	Adaptive Multi-Rate Codec (GSM telephony)
!:mime	audio/amr
!:ext  amr

According to other tools and documentation the audio samples are now
described by starting line with magic strength length 80 like:
0	string	#!AMR	Adaptive Multi-Rate Codec
Then i do sub classification for wide band variant by additional branch
that looks like:
  >5	string	-WB		(Wideband)
!:mime	audio/AMR-WB
!:apple	????amrw
!:ext	awb
According to officially registered at iana.org the audio type is
expressed by up cased phrase AMR-WB. On some sites the low case amr-wb
word is listed. That is not officially and the links and search on IANA
then are wrong and not working. On IANA also the 4 byte amrw macintosh
apple type code is listed.

For the other variant i do not check bytes after starting 5 first magic
bytes but i assume this are valid and unique enough. So this is done by
branch with lines like:
 >5	default	x		(GSM telephony)
!:mime	audio/AMR
!:apple	????amr
!:ext  amr
#!:ext  amr/3ga

Here again the official IANA audio sub type is expressed by upcase
phrase AMR and not low case. On IANA also the 4 byte "amr " macintosh
apple type code is listed. On file formats archive team web site one
sample example.3ga with 3ga suffix is listed. But on other sites such
items are not listed. So i am unsure if this is always true or if this
happens by accident. So i show only 1 suffix amr here.

After applying the above mentioned modifications by patch
file-5.45-audio-amr.diff then my audio samples are still recognized and
described with correct sub classification. This now looks like:

AUD001.amr:   Adaptive Multi-Rate Codec (GSM telephony)
amr-wb.awb:   Adaptive Multi-Rate Codec (Wideband)
example.3ga:  Adaptive Multi-Rate Codec (GSM telephony)
example.amr:  Adaptive Multi-Rate Codec (GSM telephony)

I hope my diff file can be applied in future version of file utility.

There is still something to-do. Inside Magdir/varied.script the
misidentification as "a AMR script executable (binary data)" with
strength (20=60/3) should be excluded. I will try to handle this in the
future if have fully understand what is happening there and i get more
bad/good script samples.

With best wishes,
Jörg Jenderek
--
Jörg Jenderek
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-amr.txt.gz
Type: application/x-gzip
Size: 466 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20231021/467cb375/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: droid-amr.csv.gz
Type: application/x-gzip
Size: 387 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20231021/467cb375/attachment-0001.bin>
-------------- next part --------------
--- file-5.45/magic/Magdir/audio.old	2023-03-11 19:16:44.000000000 +0100
+++ file-5.45/magic/Magdir/audio	2023-10-21 02:00:46.444000600 +0200
@@ -714,8 +714,34 @@
 # Type: Adaptive Multi-Rate Codec
 # URL:  http://filext.com/detaillist.php?extdetail=AMR
+#		http://fileformats.archiveteam.org/wiki/Adaptive_Multi-Rate_Audio
+# Reference:	https://datatracker.ietf.org/doc/html/rfc4867
+#		http://mark0.net/download/triddefs_xml.7z/defs/a/audio-amr.trid.xml
+# Update:	Joerg Jenderek
 # From: Russell Coker <russell at coker.com.au>
-0	string	#!AMR		Adaptive Multi-Rate Codec (GSM telephony)
-!:mime	audio/amr
+# Note:		called "AMR (Adaptive Multi Rate) encoded audio" by TrID and
+#		"Adaptive Multi-Rate Audio" by DROID via PUID fmt/356 and
+#		"AMR" "AMR audio" or "Adaptive Multi-Rate" by shared MIME-info database from freedesktop.org
+0	string	#!AMR		Adaptive Multi-Rate Codec
+# Adaptive Multi-Rate Codec (strength=80) before wrong "a AMR script executable (binary data)" (strength=20=60/3) by ./varied.script
+#!:strength +0
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/a/audio-awb.trid.xml
+# Note:		called "Adaptive Multi-Rate Wideband ACELP codec" by TrID and
+#		"Adaptive Multi-Rate Wideband Audio" bY DROID via PUID fmt/954 and
+#		"AMR-WB" "AMR-WB audio" or "Adaptive Multi-Rate Wideband" by shared MIME-info database from freedesktop.org
+>5	string	-WB		(Wideband)
+# https://www.iana.org/assignments/media-types/audio/AMR-WB
+!:mime	audio/AMR-WB
+#!:mime	audio/amr-wb-encrypted
+!:apple	????amrw
+!:ext	awb
+# variant without Wideband
+>5	default	x		(GSM telephony)
+# https://www.iana.org/assignments/media-types/audio/AMR
+!:mime	audio/AMR
+# last character in type code is space
+!:apple	????amr 
 !:ext  amr
+# GRR: maybe also 3ga suffix?		https://telparia.com/fileFormatSamples/audio/amr/example.3ga
+#!:ext  amr/3ga
 
 # Type: SuperCollider 3 Synth Definition File Format
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.45-audio-amr.diff.sig
Type: application/octet-stream
Size: 1055 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20231021/467cb375/attachment.obj>


More information about the File mailing list