[File] [PATCH] Magdir/sql SQLite 3.x database; *.maple *.help *.db3 *.sdb

Jörg Jenderek joerg.jen.der.ek at gmx.net
Fri Jul 23 20:01:47 UTC 2021


Hello,

some days ago i inspected some Maple examples with file name
extension maple and help file name extension.

When running running file command version 5.40  on such examples and
related files i get an output like:

AddrBook.db3:                     SQLite 3.x database,
				  last written using SQLite
				  version 3007005
cgipcrvp.db3:                     SQLite 3.x database,
				  last written using SQLite
				  version 0
corruptDB.sqlite:                 SQLite 3.x database,
				  user version 256,
				  last written using SQLite
				  version 0
fmt-729-signature-id-1053.sqlite: SQLite 3.x database
maple.help:                       SQLite 3.x database,
				  last written using SQLite
				  version 3027002
storage.sqlite:                   SQLite 3.x database,
				  user version 65536,
				  last written using SQLite
				  version 3014001
SystemID.maple:                   SQLite 3.x database,
				  application id 1558055407,
				  user version 15,
				  last written using SQLite
				  version 3013000
weewx.sdb:                        SQLite 3.x database,
				  last written using SQLite
				  version 3027002


With --extension only sqlite/sqlite3/db/dbe is displayed and with -i
option deprecated application/x-sqlite3 is shown for examples.

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html).
All examples are described by TrID as "SQLite 3.x database" by
sqlite-3x.trid.xml. (See appended sqlite-trid-v.txt.gz ). It also
displays related URL.

For the examples a page about SQLite on Wikipedia  was mentioned by
TrID as related URL. On that page a Database File Format
documentation is mentioned. So this information is now expressed by
additional comment lines like:
  # URL:		https://en.wikipedia.org/wiki/SQLite
  # Reference:	https://www.sqlite.org/fileformat.html

The detection of  examples happens by lines inside Magdir/sql like:

  0   string  SQLite\ format\ 3	SQLite 3.x database
  !:mime	application/x-sqlite3
  !:ext sqlite/sqlite3/db/dbe

According to documentation the first 100 bytes are part of the
database header. This contain some information.
The file command only print a few values at the moment. The
SQLITE_VERSION_NUMBER ( like: 0 3008011 3016002 3007014 3017000
3022000 3028000 3031001) is shown by line like
  >96 belong  x           \b, last written using SQLite version %d

At offset 16 the page size is stored as 2 byte big endian integer.
Must be a power of two between 512 and 32768. For my examples this
was 512 1024 32768 and often 4096. So show now unusual values by line
like:
  >>16 ubeshort !4096      \b, page size %u
For example corruptDB.sqlite value was 24929 and for
fmt-729-signature-id-1053.sqlite value was 0. First was an example
for a corrupt database and is found inside Thunderbird sources.
Second example is just a short 16 byte pattern for DROID file
identifying tool. So i used that information to skip DROID example by
additional second test by line like:
  >16 ubeshort >0                 SQLite 3.x
  !:mime	application/vnd.sqlite3
Furthermore now the IANA official mime type is shown.

At offset 17 the file format write version is stored as byte. For
WAL this value is 2 and for legacy (non WAL) value is 1. Afterwards
the same applies for read version byte. So shown this information by
lines like:
  >>18 ubyte   !1          \b, writer version %u
  >>19 ubyte   !1          \b, read version %u
For example corruptDB.sqlite here again invalidness becomes visible
by values 0 and 4.

At offset 21 the maximum embedded payload fraction is stored as byte
value (Must be 64 or 40 hexadecimal)
At offset 22 the minimum embedded payload fraction is stored as byte
value (Must be 32 or 20 hexadecimal).
At offset 23 the Leaf payload fraction is stored as byte value (Must
be 32 or 20 hexadecimal). So show unexpected values by lines like:
   >>21 ubyte   !64         \b, maximum payload %u
   >>22 ubyte   !32         \b, minimum payload %u
   >>23 ubyte   !32         \b, leaf payload %u
Again for corrupt example here again invalidness becomes visible by
values 1, 1 and 0.

At offset 56 the database text encoding is stored as 4 byte big
endian integer. A value of 1 means UTF-8. A value of 2 means
UTF-16le. A value of 3 means UTF-16be. So show this information in
human readable form by lines like:
  #>>56 ubelong x           \b, encoding %u
  >>56 ubelong x
  >>>56 ubelong =1         \b, UTF-8
  >>>56 ubelong =2         \b, UTF-16 little endian
  >>>56 ubelong =3         \b, UTF-16 big endian
  >>>56 default x
  >>>>56 ubelong x         \b, unknown 0x%x encoding
Now for all users becomes visible that corrupt example is not a valid
database by value 0. Surprisingly this value also occurs for examples
storage.sqlite with file size 512 bytes ( This is shown by page size
512 and database pages 1). Such examples can be found in profile
directory of TorBrowser, Firefox browser and Thunderbird.

At offset 60 the "user version" is stored as 4 byte big endian
integer. According to SQLite documentation at offset 68 the
application id is stored as 4 byte big endian integer.
These values can be used by utilities such as file(1) to determine
the specific file type. At the moment only 8 (1+7) specific sub
classes are described by these values by lines like:

  >60 belong  =0x5f4d544e  (Monotone source repository)
  >68 belong  =0x0f055112  (Fossil checkout)
  ...
  >68 belong  =0x47504b47  (OGC GeoPackage file)
  >68 default x
  >>68 belong  !0          \b, application id %u
  >>60 belong  !0          \b, user version %d

The mentioned magic template magic.txt on sqlite.org list three more
sub types. These are now done by additional 3 lines like:
 >>>68 belong =0x47503130 (OGC GeoPackage version 1.0 file)
 >>>68 belong =0x45737269 (Esri Spatially-Enabled Database)
 >>>68 belong =0x4d504258 (MBTiles tileset)

For all my Maple workbook examples like SystemID.maple the
application id is decimal 1558055407 or 5CDE09EF
hexadecimal. So show this subtype by lines like
  >>>68 belong =0x5CDE09EF (Maple Workbook)
  >>>68 default x
  >>>>68 belong !0         \b, application id %u
  >>60 belong !0          \b, user version %d
All not known applications are shown by default clause and the user
id is now always shown if not zero.

Depending on sub class specific file name extensions are used. For
Maple workbook file name extension is "maple" instead of extension
like "sqlite3". This is now down by line like:
 >>68 belong !0x5CDE09EF	database
!:ext sqlite/sqlite3/db/db3/dbe/sdb/help
 >>68 belong =0x5CDE09EF  database
!:ext maple

Furthermore "db3" is used in examples like AddrBook.db3 and
cgipcrvp.db3. The extension help is used for newer Maple help
database and the SQLite database weewx.sdb used by weather software
weewx.

After applying the above mentioned modifications by patch
file-5.40-sql-maple.diff then all database examples are still
are recognized, described with more details and some
misidentifications vanished like:

AddrBook.db3:                     SQLite 3.x database,
				  last written using SQLite
				  version 3007005
				  , page size 1024
				  , file counter 2081
				  , database pages 100
				  , cookie 0x15
				  , schema 1
				  , UTF-16 little endian
				  , version-valid-for 2081
cgipcrvp.db3:                     SQLite 3.x database,
				  last written using SQLite
				  version 0
				  , page size 1024
				  , file counter 25008
				  , database pages 0
				  , cookie 0x18d
				  , schema 4
				  , cache page size 2000
				  , UTF-8
				  , version-valid-for 0
corruptDB.sqlite:                 SQLite 3.x database,
				  user version 256,
				  last written using SQLite
				  version 0
				  , page size 24929
				  , writer version 0
				  , read version 4
				  , maximum payload 1
				  , minimum payload 1
				  , leaf payload 0
				  , file counter 1075847168
				  , database pages 2109952
				  , free pages 1024
				  , cookie 0x1f00
				  , schema 3328
				  , cache page size 1024
				  , unknown 0x0 encoding
				  , vacuum mode 2048
				  , version-valid-for 0
fmt-729-signature-id-1053.sqlite: data
maple.help:                       SQLite 3.x database,
				  last written using SQLite
				  version 3027002
				  , file counter 35509
				  , database pages 210163
				  , cookie 0x35
				  , schema 4
				  , UTF-8
				  , version-valid-for 35509
storage.sqlite:                   SQLite 3.x database,
				  user version 65536,
				  last written using SQLite
				  version 3014001
				  , page size 512
				  , file counter 1
				  , database pages 1
				  , cookie 0x0
				  , schema 0
				  , unknown 0x0 encoding
				  , version-valid-for 1
SystemID.maple:                   SQLite 3.x database
				  (Maple Workbook),
				  user version 15,
				  last written using SQLite
				  version 3013000
				  , file counter 395
				  , database pages 269
				  , cookie 0xa4
				  , schema 4
				  , UTF-8
				  , version-valid-for 395
weewx.sdb:                        SQLite 3.x database,
				  last written using SQLite
				  version 3027002
				  , file counter 10
				  , database pages 240
				  , cookie 0x72
				  , schema 4
				  , UTF-8
				  , version-valid-for 10


I hope my diff file can be applied in future version of file utility.

With best wishes
Jörg Jenderek
--
Jörg Jenderek




















-------------- next part --------------
--- file-5.40/magic/Magdir/sql.old	2021-02-22 23:49:24 +0000
+++ file-5.40/magic/Magdir/sql	2021-07-23 19:29:46 +0000
@@ -93,2 +93,5 @@
 
+# URL:		https://en.wikipedia.org/wiki/SQLite
+# Reference:	https://www.sqlite.org/fileformat.html
+# Update:	Joerg Jenderek
 # Version 3 of SQLite allows applications to embed their own "user version"
@@ -98,6 +101,15 @@
 #
-0   string  SQLite\ format\ 3	SQLite 3.x database
-!:mime	application/x-sqlite3
+0   string  SQLite\ format\ 3
+# skip DROID fmt-729-signature-id-1053.sqlite by checking for valid page size
+>16 ubeshort >0                 SQLite 3.x
+# deprecated
+#!:mime	application/x-sqlite3
+!:mime	application/vnd.sqlite3
 # seldom found extension sqlite3 like in SyncData.sqlite3
 # db
+# db3 like: AddrBook.db3 cgipcrvp.db3
+# https://www.maplesoft.com/support/help/Maple/view.aspx?path=worksheet%2freference%2fhelpdatabase
+# help is used for newer Maple help database
+# SQLite database weewx.sdb used by weather software weewx
+# https://www.weewx.com/docs/usersguide.htm
 # Avira Antivir use extension "dbe" like in avevtdb.dbe, avguard_tchk.dbe
@@ -106,15 +118,81 @@
 # and with string "ZV-zlib" in like extra.sqlite
-!:ext sqlite/sqlite3/db/dbe
->60 belong  =0x5f4d544e  (Monotone source repository)
->68 belong  =0x0f055112  (Fossil checkout)
->68 belong  =0x0f055113  (Fossil global configuration)
->68 belong  =0x0f055111  (Fossil repository)
->68 belong  =0x42654462  (Bentley Systems BeSQLite Database)
->68 belong  =0x42654c6e  (Bentley Systems Localization File)
->68 belong  =0x47504b47  (OGC GeoPackage file)
->68 default x
->>68 belong  !0          \b, application id %u
->>60 belong  !0          \b, user version %d
->96 belong  x            \b, last written using SQLite version %d
-
+>>68 belong !0x5CDE09EF	database
+!:ext sqlite/sqlite3/db/db3/dbe/sdb/help
+>>68 belong =0x5CDE09EF  database
+# maple is used for Maple Workbook
+!:ext maple
+>>60 belong =0x5f4d544e  (Monotone source repository)
+# if no known user version then check for Application IDs with default clause
+>>60 belong !0x5f4d544e
+# The "Application ID" set by PRAGMA application_id
+>>>68 belong =0x0f055112 (Fossil checkout)
+>>>68 belong =0x0f055113 (Fossil global configuration)
+>>>68 belong =0x0f055111 (Fossil repository)
+>>>68 belong =0x42654462 (Bentley Systems BeSQLite Database)
+>>>68 belong =0x42654c6e (Bentley Systems Localization File)
+>>>68 belong =0x47504b47 (OGC GeoPackage file)
+#	https://www.sqlite.org/src/artifact?ci=trunk&filename=magic.txt
+>>>68 belong =0x47503130 (OGC GeoPackage version 1.0 file)
+>>>68 belong =0x45737269 (Esri Spatially-Enabled Database)
+>>>68 belong =0x4d504258 (MBTiles tileset)
+#	https://www.maplesoft.com/support/help/errors/view.aspx?path=Formats/Maple
+>>>68 belong =0x5CDE09EF (Maple Workbook)
+# unknown application ID
+>>>68 default x
+>>>>68 belong !0         \b, application id %u
+# The "user version" as read and set by the user_version pragma like:
+# 1 2 4 5 7 9 10 25 36 43 53 400 416 131073 131074 131075
+>>60 belong !0          \b, user version %d
+# SQLITE_VERSION_NUMBER like: 0 3008011 3016002 3007014 3017000 3022000 3028000 3031001
+>>96 belong  x           \b, last written using SQLite version %d
+# database page size in bytes; a power of two between 512 and 32768, or 1 for 65536
+# like: 512 1024 often 4096 32768
+>>16 ubeshort !4096      \b, page size %u
+# File format write version. 1 for legacy; 2 for WAL; 0 for corruptDB.sqlite
+>>18 ubyte   !1          \b, writer version %u
+# File format read version. 1 for legacy; 2 for WAL; 4 for corruptDB.sqlite
+>>19 ubyte   !1          \b, read version %u
+# Bytes of unused "reserved" space at the end of each page. Usually 0
+>>20 ubyte   !0          \b, unused bytes %u
+# maximum embedded payload fraction. Must be 64; 1 for corruptDB.sqlite
+>>21 ubyte   !64         \b, maximum payload %u
+# Minimum embedded payload fraction. Must be 32; 1 for corruptDB.sqlite
+>>22 ubyte   !32         \b, minimum payload %u
+# Leaf payload fraction. Must be 32; 0 for corruptDB.sqlite
+>>23 ubyte   !32         \b, leaf payload %u
+# file change counter
+>>24 ubelong x           \b, file counter %u
+# Size of the database file in pages
+>>28 ubelong x           \b, database pages %u
+# page number of the first freelist trunk page like: 0 2 3 4 5 9
+# 10 13 14 15 16 17 18 19 23 36 39 46 50 136 190 217 307 505 516 561 883 1659
+>>32 ubelong !0          \b, 1st free page %u
+# total number of freelist pages
+>>36 ubelong !0          \b, free pages %u
+# The schema cookie like: 2 3 4 6 7 9 A D E F 13 14 1C 25 2A 2F 33 44 4B 53 5A 5F 62 86 87 8F 91 A8
+>>40 ubelong x           \b, cookie 0x%x
+# the schema format number. Supported formats are 1 2 3 and often 4
+# 3328 for corruptDB.sqlite and 0 for 512 byte storage.sqlite (TorBrowser Firefox Thunderbird)
+>>44 ubelong x           \b, schema %u
+# Suggested cache size  like: 0 2000
+>>48 ubelong !0          \b, cache page size %u
+# The page number of the largest root b-tree page when in auto-vacuum or incremental-vacuum modes, or zero otherwise. 
+>>52 ubelong !0          \b, largest root page %u
+# The database text encoding; a value of 1 means UTF-8; 2 means UTF-16le; 3 means UTF-16be
+#>>56 ubelong x           \b, encoding %u
+>>56 ubelong x
+>>>56 ubelong =1         \b, UTF-8
+>>>56 ubelong =2         \b, UTF-16 little endian
+>>>56 ubelong =3         \b, UTF-16 big endian
+# 0 for corruptDB.sqlite and for storage.sqlite with database pages 1 (TorBrowser Firefox Thunderbird)
+# https://mozilla.github.io/firefox-browser-architecture/text/0010-firefox-data-stores.html
+>>>56 default x
+>>>>56 ubelong x         \b, unknown 0x%x encoding
+# True (non-zero) for incremental-vacuum mode; false (zero) otherwiseqy
+>>64 ubelong !0           \b, vacuum mode %u
+# Reserved for expansion. Must be zero
+>>72 uquad !0             \b, reseved 0x%llx
+# The version-valid-for number like:
+# 1 2 3 4 C F 68h 95h 266h A99h 3DCDh B7CEh
+>>92 ubelong x            \b, version-valid-for %u
 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.40-sql-maple.diff.sig
Type: application/octet-stream
Size: 2528 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20210723/c074dd70/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sqlite-trid-v.txt.gz
Type: application/x-gzip
Size: 528 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20210723/c074dd70/attachment.bin>


More information about the File mailing list