[File] [PATCH] of Magdir/sql SQLite 3.x database; Audacity 3 Project +TeXnicard +

Jörg Jenderek joerg.jen.der.ek at gmx.net
Sun Apr 23 23:33:11 UTC 2023


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

some weeks ago i send patch for SQLite Write-Ahead Log shared memory.
There i found many different unexpected suffix. So for control
reasons i look for extensions of related SQLite database. For every
shared memory file exist a companion database file without 4 byte
phrase -shm at the end. One of such sort has 4 bytes suffix AUP3 for
Audacity 3 Project.

When running file command (version 5.44) on such Audacity and other
SQLite database examples i get an output like:
Diagnostic Data:
	SQLite 3.x database,
	last written using SQLite version 3036000,
	writer version 2, read version 2,
	file counter 2, database pages 1,
	cookie 0, schema 0, largest root page 1,
	unknown 0 encoding, version-valid-for 2
F4CEEE47-042C-4828-95A0-DE44EC267A28.db3:
	SQLite 3.x database,
	last written using SQLite version 3013000,
	writer version 2, read version 2,
	file counter 125, database pages 36,
	cookie 0x200, schema 4,
	UTF-8, version-valid-for 125
_FOSSIL_:
	SQLite 3.x database (Fossil checkout),
	last written using SQLite version 3041000,
	file counter 6, database pages 9,
	cookie 0x7, schema 4,
	UTF-8, version-valid-for 6
_fossil:
	SQLite 3.x database (Fossil global configuration),
	last written using SQLite version 3041000,
	file counter 11, database pages 3,
	cookie 0x1, schema 4,
	UTF-8, version-valid-for 11
bluemarble.gpkg:
	SQLite 3.x database (OGC GeoPackage version 1.0 file),
	last written using SQLite version 3011000,
	page size 1024,
	file counter 73, database pages 29910,
	1st free page 4, free pages 6,
	cookie 0x10, schema 4,
	UTF-8, version-valid-for 73
home-html.user.xowa:
	SQLite 3.x database,
	last written using SQLite version 3018000,
	file counter 7, database pages 4,
	cookie 0x3, schema 4,
	UTF-8, version-valid-for 7
http_itv.ard.de_0.localstorage:
	SQLite 3.x database,
	last written using SQLite version 3027002,
	page size 1024,
	file counter 10, database pages 3,
	cookie 0x1, schema 4,
	UTF-16 little endian, version-valid-for 10
recycle.aup3:
	SQLite 3.x database,
	application id 1096107097, user version 50331648,
	last written using SQLite version 3035005,
	page size 1,
	writer version 2, read version 2,
	file counter 4, database pages 9,
	cookie 0x4, schema 4,
	UTF-8, version-valid-for 4
sqlar-src-4824e73896.sqlar:
	SQLite 3.x database,
	last written using SQLite version 3042000,
	page size 512,
	file counter 1, database pages 3930,
	cookie 0x1, schema 4,
	UTF-8, version-valid-for 1
sqlite-wal_tmp.fossil:
	SQLite 3.x database (Fossil repository),
	last written using SQLite version 3041000,
	file counter 6, database pages 56,
	cookie 0x28, schema 4,
	UTF-8, version-valid-for 6
storage.ide:
	SQLite 3.x database,
	last written using SQLite version 3016001,
	writer version 2, read version 2,
	file counter 2, database pages 139,
	cookie 0x5, schema 4,
	UTF-8, version-valid-for 2

At first glance this does not look bad, but when running with
- --extension option i get output like:

Diagnostic Data:                          sqlite/sqlite3/db/db3/
	   				  dbe/sdb/help
F4CEEE47-042C-4828-95A0-DE44EC267A28.db3: sqlite/sqlite3/db/db3/
					  dbe/sdb/help
_FOSSIL_:                                 sqlite/sqlite3/db/db3/
					  dbe/sdb/help
_fossil:                                  sqlite/sqlite3/db/db3/
					  dbe/sdb/help
bluemarble.gpkg:                          sqlite/sqlite3/db/db3/
					  dbe/sdb/help
home-html.user.xowa:                      sqlite/sqlite3/db/db3/
					  dbe/sdb/help
http_itv.ard.de_0.localstorage:           sqlite/sqlite3/db/db3/
					  dbe/sdb/help
recycle.aup3:                             sqlite/sqlite3/db/db3/
					  dbe/sdb/help
sqlar-src-4824e73896.sqlar:               sqlite/sqlite3/db/db3/
					  dbe/sdb/help
sqlite-wal_tmp.fossil:                    sqlite/sqlite3/db/db3/
					  dbe/sdb/help
storage.ide:                              sqlite/sqlite3/db/db3/
					  dbe/sdb/help


For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). The examples are here
described with low priority as "SQLite 3.x database" with deprecated
mime type application/x-sqlite3 by sqlite-3x.trid.xml.
Most AUP3 samples are described with higher priority as
"Audacity 3 Project" with correct suffix AUP3 by aup3.trid.xml.
GPKG samples are described with higher priority as "GeoPackage" with
mime type "application/octet-stream" by gpkg.trid.xml.
SQLAR samples like sqlar-src-4824e73896.sqlar are described with high
priority as "SQLite Archive compressed" with application/x-sqlite3
mime type by ark-sqlar.trid.xml (See appended trid-v-sqlite.txt.gz).

For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/). Here
the audacity samples are described as "Audacity Project File" with
version 3.x with aup3 suffix and without mime type via PUID fmt/1826.
The GPKG samples like bluemarble.gpkg are described as "OGC
GeoPackage" with version 1.0-1.31 with mime type
application/geopackage+sqlite3 via PUID fmt/1700. The other samples
are also recognized but are described generic as "SQLite Database
File Format" with version "3" and with deprecated mime type
application/x-sqlite3 by PUID fmt/729. The suffix ide like in
storage.id,  localstorage, sqlar, xowa, fossil and without suffix
like sample "Diagnostic Data" are considered as "bad" (see true
EXTENSION_MISMATCH in appended droid-sqlite.csv.gz).

TrID list the used file name extension and often with -v option the
related URL pointing to some information. So i found page for
audacity on Wikipedia and file formats archive team web site. This is
now expressed by additional comment lines inside Magdir/sql like:
# URL: 		https://en.wikipedia.org/wiki/Audacity_(audio_editor)
#		http://fileformats.archiveteam.org/
#		wiki/Audacity_Project_Format
# Reference:	http://mark0.net/download/triddefs_xml.7z
#		defs/a/aup3.trid.xml

The description of SQLite 3.x database happens inside Magdir/sql by
lines like:
 0   string  SQLite\ format\ 3
 >16 ubeshort >0                 SQLite 3.x
 !:mime	application/vnd.sqlite3

After checking for starting magic pattern the DROID sample
fmt-729-signature-id-1053.sqlite is skipped by checking for valid
page size in second test line.

The sub classification is done by  looking first for known user versi
on
and then for Application ID. Print this if no known id is found by
lines like:
 >>60 belong =0x5f4d544e  (Monotone source repository)
 >>>68 belong =0x0f055112 (Fossil checkout)
 ...
 >>>68 belong =0x5CDE09EF (Maple Workbook)
 >>>68 default x
 >>>>68 belong !0         \b, application id %u
 >>60 belong !0          \b, user version %d

According to DROID fmt/1826 characteristic for Audacity is string
AUDY (that is hexadecimal 41554459) as application id. So i must
insert before default clause a line like:
 >>>68 belong =0x41554459 (Audacity Project)

I also show user version as hexadecimal after decimal version because
some application apparently use this interpretation. So for audacity
examples i get here value decimal 50331648. This is expressed as
hexadecimal 03000000 which is obviously read as version 3. So this
done by additional line like:
 >>>60 belong	x	%#x

Unfortunately some application use other file name extension instead
of standard suffix like sqlite, sqlite3 or db3 and also other mime
types. That is an evil side with open software. Everyone can take
such software like SQLite, but using another file extension and
other features like specific application id and especially not
telling these differences. This in the end leads to situation like
in bible view speech chaos during building attempt a tower of
Babylon or in Tolkien view like transforming elves to orcs. Too
overcome that bad aspects file command tries to mention used
suffix. After version 5.40 the original phrase "SQLite 3.x
database" is split into first part "SQLite 3.x" and an additional
phrase "database" to get different suffix. This was the case for
Maple Workbook. There suffix maple was used. So this distinction
was done by lines after starting test lines like:
 >>68 belong !0x5CDE09EF	database
 !:ext sqlite/sqlite3/db/db3/dbe/sdb/help
 >>68 belong =0x5CDE09EF  database
 !:ext maple

After rechecking magic.txt on sqlite.org and looking for SQLite
based on file formats archive team web site this now becomes like:
 >>68 belong =0x41554459  database
 !:mime	application/x-audacity-project+sqlite3
 !:ext	aup3
 >>68 belong =0x6A035744  database
 !:mime	application/vnd.sqlite3
 !:ext db
 >>68 belong =0x0f055111  database
 !:mime	application/vnd.sqlite3
 !:ext	fossil
 >>68 belong =0x0f055112  database
 !:mime	application/vnd.sqlite3
 !:ext /fslckout
 >>68 belong =0x0f055113  database
 !:mime	application/vnd.sqlite3
 !:ext /fossil
 >>68 belong =0x47503130  database
 !:mime	application/geopackage+sqlite3
 !:ext gpkg
 >>68 default x           database
 !:mime	application/vnd.sqlite3
 !:ext /sqlite/sqlite3/db/db3/dbe/sdb/help/
	ide/localstorage/sqlar/xowa

For "Audacity Project File" version 3.x suffix aup3 and mime type
application/x-audacity-project+sqlite3 is used. For TeXnicard
database db suffix is mentioned, but i myself do not found such
samples. For Fossil repository database typically fossil suffix is
used, whereas Fossil checkout has name like _FOSSIL_ or .fslckout
and Fossil global configuration has name like
%LOCALAPPDATA%\_fossil or ~/.fossil. OGC GeoPackage version 1.x has
suffix gpkg and mime type application/geopackage+sqlite3. In the
unspecific branch i found sample with ide suffix like in
storage.ide. localstorage is used like in Enigma2
http_itv.ard.de_0.localstorage. xowa like in
home-html.user.xowa is used for offline copies of Wikipedia
(http://fileformats.archiveteam.org/wiki/XOWA). sqlar like in
sqlar-src-4824e73896.sqlar is used for SQLite Archive
(http://fileformats.archiveteam.org/wiki/SQLite_Archive). I also
found sample without suffix like "Diagnostic Data".

The database page size in bytes is also stored in header. This is a
power of two between 512 and 32768. Often the value 4096 is used.
Here value 1 like in AUP3 sample means page size 65536. This was
shown by line like:
 >>16 ubeshort !4096      \b, page size %u
So this now becomes like:
 >>16 ubeshort !4096      \b, page size
 >>>16	ubeshort !1      %u
 >>>16	ubeshort =1      65536

After applying the above mentioned modifications by patch
file-5.44-sql-aup3.diff then i get a more precise output
like:

Diagnostic Data:
	SQLite 3.x database,
	last written using SQLite version 3036000,
	writer version 2, read version 2,
	file counter 2, database pages 1,
	cookie 0, schema 0, largest root page 1,
	unknown 0 encoding, version-valid-for 2
F4CEEE47-042C-4828-95A0-DE44EC267A28.db3:
	SQLite 3.x database,
	last written using SQLite version 3013000,
	writer version 2, read version 2,
	file counter 125, database pages 36,
	cookie 0x200, schema 4,
	UTF-8, version-valid-for 125
_FOSSIL_:
	SQLite 3.x database (Fossil checkout),
	last written using SQLite version 3041000,
	file counter 6,
	database pages 9,
	cookie 0x7, schema 4,
	UTF-8, version-valid-for 6
_fossil:
	SQLite 3.x database (Fossil global configuration),
	last written using SQLite version 3041000,
	file counter 11,
	database pages 3,
	cookie 0x1, schema 4,
	UTF-8, version-valid-for 11
bluemarble.gpkg:
	SQLite 3.x database (OGC GeoPackage version 1.0 file),
	last written using SQLite version 3011000,
	page size 1024,
	file counter 73, database pages 29910,
	1st free page 4, free pages 6,
	cookie 0x10, schema 4,
	UTF-8, version-valid-for 73
home-html.user.xowa:
	SQLite 3.x database,
	last written using SQLite version 3018000,
	file counter 7, database pages 4,
	cookie 0x3, schema 4,
	UTF-8, version-valid-for 7
http_itv.ard.de_0.localstorage:
	SQLite 3.x database,
	last written using SQLite version 3027002,
	page size 1024,
	file counter 10, database pages 3,
	cookie 0x1, schema 4,
	UTF-16 little endian, version-valid-for 10
recycle.aup3:
	SQLite 3.x database (Audacity Project),
	user version 50331648 0x3000000,
	last written using SQLite version 3035005,
	page size 65536,
	writer version 2, read version 2,
	file counter 4, database pages 9,
	cookie 0x4, schema 4,
	UTF-8, version-valid-for 4
sqlar-src-4824e73896.sqlar:
	SQLite 3.x database,
	last written using SQLite version 3042000,
	page size 512,
	file counter 1, database pages 3930,
	cookie 0x1, schema 4,
	UTF-8, version-valid-for 1
sqlite-wal_tmp.fossil:
	SQLite 3.x database (Fossil repository),
	last written using SQLite version 3041000,
	file counter 6, database pages 56,
	cookie 0x28, schema 4,
	UTF-8, version-valid-for 6
storage.ide:
	SQLite 3.x database,
	last written using SQLite version 3016001,
	writer version 2, read version 2,
	file counter 2, database pages 139,
	cookie 0x5, schema 4,
	UTF-8, version-valid-for 2

With additional --extension option now i get:

Diagnostic Data:                          /sqlite/sqlite3/db/db3/
	   				  dbe/sdb/help/
					  ide/localstorage/sqlar/xowa
F4CEEE47-042C-4828-95A0-DE44EC267A28.db3: /sqlite/sqlite3/db/db3/
					  dbe/sdb/help/
					  ide/localstorage/sqlar/xowa
_FOSSIL_:                                 /fslckout
_fossil:                                  /fossil
bluemarble.gpkg:                          gpkg
home-html.user.xowa:                      /sqlite/sqlite3/db/db3/
					  dbe/sdb/help/
					  ide/localstorage/sqlar/xowa
http_itv.ard.de_0.localstorage:           /sqlite/sqlite3/db/db3/
					  dbe/sdb/help/
					  ide/localstorage/sqlar/xowa
recycle.aup3:                             aup3
sqlar-src-4824e73896.sqlar:               /sqlite/sqlite3/db/db3/
					  dbe/sdb/help/
					  ide/localstorage/sqlar/xowa
sqlite-wal_tmp.fossil:                    fossil
storage.ide:                              /sqlite/sqlite3/db/db3/
					  dbe/sdb/help/
					  ide/localstorage/sqlar/xowa


With additional -i option now i get:

Diagnostic Data:
	application/vnd.sqlite3; charset=binary
F4CEEE47-042C-4828-95A0-DE44EC267A28.db3:
	application/vnd.sqlite3; charset=binary
_FOSSIL_:
	application/vnd.sqlite3; charset=binary
_fossil:
	application/vnd.sqlite3; charset=binary
bluemarble.gpkg:
	application/geopackage+sqlite3; charset=binary
home-html.user.xowa:
	application/vnd.sqlite3; charset=binary
http_itv.ard.de_0.localstorage:
	application/vnd.sqlite3; charset=binary
recycle.aup3:
	application/x-audacity-project+sqlite3; charset=binary
sqlar-src-4824e73896.sqlar:
	application/vnd.sqlite3; charset=binary
sqlite-wal_tmp.fossil:
	application/vnd.sqlite3; charset=binary
storage.ide:
	application/vnd.sqlite3; charset=binary


I hope my diff file can be applied in future version of
file utility.

There are also mentioned are other suffix like s3db, sl3, sketch, fbt
lrcat. But i myself found no such samples or the related software
is only available after buying for hundreds of dollars or telling
all about my life (Why i should give up my privacy to create just a
sqlite based test example?). So such are missing and other people
are welcomed to add related entries. For Audacity exist older
variants which are xml based. So such samples should be recognized
by lines inside Magdir/sgml but there the lines are organized a
chaos way, so that i am not able to do this work.

With best wishes
Jörg Jenderek
- --
Jörg Jenderek







-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCZEXANwAKCRCv8rHJQhrU
1mroAJ9FmmREYEAwLgz0W3b0V4B93gEAqQCfS3Kon7cG0Ao3CgF5WRqunf419UE=
=NDsW
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: droid-sqlite.csv.gz
Type: application/x-gzip
Size: 793 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230424/d1048283/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-sqlite.txt.gz
Type: application/x-gzip
Size: 777 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230424/d1048283/attachment-0003.bin>
-------------- next part --------------
--- file-5.44/magic/Magdir/sql.old	2022-08-16 15:01:06.000000000 +0200
+++ file-5.44/magic/Magdir/sql	2023-04-24 01:06:18.876674200 +0200
@@ -94,2 +94,3 @@
 # URL:		https://en.wikipedia.org/wiki/SQLite
+#		http://fileformats.archiveteam.org/wiki/SQLite
 # Reference:	https://www.sqlite.org/fileformat.html
@@ -106,3 +107,2 @@
 #!:mime	application/x-sqlite3
-!:mime	application/vnd.sqlite3
 # seldom found extension sqlite3 like in SyncData.sqlite3
@@ -115,2 +115,10 @@
 # Avira Antivir use extension "dbe" like in avevtdb.dbe, avguard_tchk.dbe
+# ide is used in storage.ide
+# localstorage like in Enigma2 http_itv.ard.de_0.localstorage
+# xowa like in home-html.user.xowa		http://fileformats.archiveteam.org/wiki/XOWA
+# sqlar like in sqlar-src-4824e73896.sqlar	http://fileformats.archiveteam.org/wiki/SQLite_Archive
+# sketch					http://fileformats.archiveteam.org/wiki/Sketch
+# ftb						http://fileformats.archiveteam.org/wiki/MyHeritage_Family_Tree_Builder
+# lrcat						http://fileformats.archiveteam.org/wiki/Lightroom_catalog
+# without suffix like in "Diagnostic Data"
 # Unfortunately extension sqlite also used for other databases starting with string
@@ -118,4 +126,2 @@
 # and with string "ZV-zlib" in like extra.sqlite
->>68 belong !0x5CDE09EF	database
-!:ext sqlite/sqlite3/db/db3/dbe/sdb/help
 >>68 belong =0x5CDE09EF  database
@@ -123,2 +129,52 @@
 !:ext maple
+# From:		Joerg Jenderek
+# URL: 		https://en.wikipedia.org/wiki/Audacity_(audio_editor)
+#		http://fileformats.archiveteam.org/wiki/Audacity_Project_Format
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/a/aup3.trid.xml
+# Note:		called "Audacity 3 Project" by TrID and "Audacity Project File" version 3.x by DROID via PUID fmt/1826
+#		with user version 03000000h whereas older versions *.AUP are not SQLite based
+>>68 belong =0x41554459  database
+# https://github.com/audacity/audacity/blob/master/src/audacity.xml
+!:mime	application/x-audacity-project+sqlite3
+# aup3 is used for Audacity 3 Project File
+!:ext	aup3
+# From:		Joerg Jenderek
+# URL: 		http://fileformats.archiveteam.org/wiki/TeXnicard
+# Reference:	http://fileformats.archiveteam.org/wiki/TeXnicard_card_database
+# Note:		no examples found
+>>68 belong =0x6A035744  database
+!:mime	application/vnd.sqlite3
+!:ext	db
+# Update:	Joerg Jenderek
+# URL: 		http://fileformats.archiveteam.org/wiki/Fossil_repository_database
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/f/fossil.trid.xml 
+# Note:		called "Fossil repository database" by TrID
+>>68 belong =0x0f055111  database
+!:mime	application/vnd.sqlite3
+!:ext	fossil
+# URL: 		http://fileformats.archiveteam.org/wiki/Fossil_checkout_database
+>>68 belong =0x0f055112  database
+!:mime	application/vnd.sqlite3
+# name _FOSSIL_ or .fslckout
+!:ext	/fslckout
+# Update:	Joerg Jenderek
+# URL: 		http://fileformats.archiveteam.org/wiki/Fossil_repository_database
+>>68 belong =0x0f055113  database
+!:mime	application/vnd.sqlite3
+# %LOCALAPPDATA%\_fossil or ~/.fossil
+!:ext	/fossil
+# Update:	Joerg Jenderek
+# URL: 		http://fileformats.archiveteam.org/wiki/GeoPackage
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/g/gpkg.trid.xml
+# Note:		called "GeoPackage" by TrID and "OGC GeoPackage" version 1.0-1.31 by DROID via PUID fmt/1700
+#		GP10 application id
+>>68 belong =0x47503130  database
+# https://www.iana.org/assignments/media-types/application/geopackage+sqlite3
+!:mime	application/geopackage+sqlite3
+# https://github.com/opengeospatial/ets-gpkg12/blob/master/src/test/resources/gpkg/bluemarble.gpkg
+!:ext	gpkg
+>>68 default x           database
+!:mime	application/vnd.sqlite3
+# no examples found with s3db sl3 suffix
+!:ext	/sqlite/sqlite3/db/db3/dbe/sdb/help/ide/localstorage/sqlar/xowa
 >>60 belong =0x5f4d544e  (Monotone source repository)
@@ -139,2 +195,5 @@
 >>>68 belong =0x5CDE09EF (Maple Workbook)
+# AUDY	Audacity Project File
+>>>68 belong =0x41554459 (Audacity Project)
+>>>68 belong =0x6A035744 (TeXnicard card database)
 # unknown application ID
@@ -143,5 +202,7 @@
 # The "user version" as read and set by the user_version pragma like:
-# 1 2 4 5 7 9 10 25 36 43 53 400 416 131073 131074 131075
+# 1 2 4 5 7 9 10 25 36 43 53 400 416 131073 131074 131075 50331648
 >>60 belong !0          \b, user version %d
-# SQLITE_VERSION_NUMBER like: 0 3008011 3016002 3007014 3017000 3022000 3028000 3031001
+# expressed as hexadecimal
+>>>60 belong	x	%#x
+# SQLITE_VERSION_NUMBER like: 0 3007014 3008011 3016002 3017000 3022000 3028000 3031001 3032003 3035005
 >>96 belong  x           \b, last written using SQLite version %d
@@ -149,3 +210,5 @@
 # like: 512 1024 often 4096 32768
->>16 ubeshort !4096      \b, page size %u
+>>16 ubeshort !4096      \b, page size
+>>>16	ubeshort !1      %u
+>>>16	ubeshort =1      65536
 # File format write version. 1 for legacy; 2 for WAL; 0 for corruptDB.sqlite
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.44-sql-aup3.diff.sig
Type: application/octet-stream
Size: 2002 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230424/d1048283/attachment-0001.obj>


More information about the File mailing list