[File] [PATCH] Magdir/sql SQLite 3.x database; *.maple *.help *.db3 *.sdb

Christos Zoulas christos at zoulas.com
Fri Jul 30 15:00:05 UTC 2021


Committed, thanks!

christos

> On Jul 23, 2021, at 11:01 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
> 
> Hello,
> 
> some days ago i inspected some Maple examples with file name
> extension maple and help file name extension.
> 
> When running running file command version 5.40  on such examples and
> related files i get an output like:
> 
> AddrBook.db3:                     SQLite 3.x database,
> 				  last written using SQLite
> 				  version 3007005
> cgipcrvp.db3:                     SQLite 3.x database,
> 				  last written using SQLite
> 				  version 0
> corruptDB.sqlite:                 SQLite 3.x database,
> 				  user version 256,
> 				  last written using SQLite
> 				  version 0
> fmt-729-signature-id-1053.sqlite: SQLite 3.x database
> maple.help:                       SQLite 3.x database,
> 				  last written using SQLite
> 				  version 3027002
> storage.sqlite:                   SQLite 3.x database,
> 				  user version 65536,
> 				  last written using SQLite
> 				  version 3014001
> SystemID.maple:                   SQLite 3.x database,
> 				  application id 1558055407,
> 				  user version 15,
> 				  last written using SQLite
> 				  version 3013000
> weewx.sdb:                        SQLite 3.x database,
> 				  last written using SQLite
> 				  version 3027002
> 
> 
> With --extension only sqlite/sqlite3/db/dbe is displayed and with -i
> option deprecated application/x-sqlite3 is shown for examples.
> 
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html).
> All examples are described by TrID as "SQLite 3.x database" by
> sqlite-3x.trid.xml. (See appended sqlite-trid-v.txt.gz ). It also
> displays related URL.
> 
> For the examples a page about SQLite on Wikipedia  was mentioned by
> TrID as related URL. On that page a Database File Format
> documentation is mentioned. So this information is now expressed by
> additional comment lines like:
> # URL:		https://en.wikipedia.org/wiki/SQLite
> # Reference:	https://www.sqlite.org/fileformat.html
> 
> The detection of  examples happens by lines inside Magdir/sql like:
> 
> 0   string  SQLite\ format\ 3	SQLite 3.x database
> !:mime	application/x-sqlite3
> !:ext sqlite/sqlite3/db/dbe
> 
> According to documentation the first 100 bytes are part of the
> database header. This contain some information.
> The file command only print a few values at the moment. The
> SQLITE_VERSION_NUMBER ( like: 0 3008011 3016002 3007014 3017000
> 3022000 3028000 3031001) is shown by line like
> >96 belong  x           \b, last written using SQLite version %d
> 
> At offset 16 the page size is stored as 2 byte big endian integer.
> Must be a power of two between 512 and 32768. For my examples this
> was 512 1024 32768 and often 4096. So show now unusual values by line
> like:
> >>16 ubeshort !4096      \b, page size %u
> For example corruptDB.sqlite value was 24929 and for
> fmt-729-signature-id-1053.sqlite value was 0. First was an example
> for a corrupt database and is found inside Thunderbird sources.
> Second example is just a short 16 byte pattern for DROID file
> identifying tool. So i used that information to skip DROID example by
> additional second test by line like:
> >16 ubeshort >0                 SQLite 3.x
> !:mime	application/vnd.sqlite3
> Furthermore now the IANA official mime type is shown.
> 
> At offset 17 the file format write version is stored as byte. For
> WAL this value is 2 and for legacy (non WAL) value is 1. Afterwards
> the same applies for read version byte. So shown this information by
> lines like:
> >>18 ubyte   !1          \b, writer version %u
> >>19 ubyte   !1          \b, read version %u
> For example corruptDB.sqlite here again invalidness becomes visible
> by values 0 and 4.
> 
> At offset 21 the maximum embedded payload fraction is stored as byte
> value (Must be 64 or 40 hexadecimal)
> At offset 22 the minimum embedded payload fraction is stored as byte
> value (Must be 32 or 20 hexadecimal).
> At offset 23 the Leaf payload fraction is stored as byte value (Must
> be 32 or 20 hexadecimal). So show unexpected values by lines like:
>  >>21 ubyte   !64         \b, maximum payload %u
>  >>22 ubyte   !32         \b, minimum payload %u
>  >>23 ubyte   !32         \b, leaf payload %u
> Again for corrupt example here again invalidness becomes visible by
> values 1, 1 and 0.
> 
> At offset 56 the database text encoding is stored as 4 byte big
> endian integer. A value of 1 means UTF-8. A value of 2 means
> UTF-16le. A value of 3 means UTF-16be. So show this information in
> human readable form by lines like:
> #>>56 ubelong x           \b, encoding %u
> >>56 ubelong x
> >>>56 ubelong =1         \b, UTF-8
> >>>56 ubelong =2         \b, UTF-16 little endian
> >>>56 ubelong =3         \b, UTF-16 big endian
> >>>56 default x
> >>>>56 ubelong x         \b, unknown 0x%x encoding
> Now for all users becomes visible that corrupt example is not a valid
> database by value 0. Surprisingly this value also occurs for examples
> storage.sqlite with file size 512 bytes ( This is shown by page size
> 512 and database pages 1). Such examples can be found in profile
> directory of TorBrowser, Firefox browser and Thunderbird.
> 
> At offset 60 the "user version" is stored as 4 byte big endian
> integer. According to SQLite documentation at offset 68 the
> application id is stored as 4 byte big endian integer.
> These values can be used by utilities such as file(1) to determine
> the specific file type. At the moment only 8 (1+7) specific sub
> classes are described by these values by lines like:
> 
> >60 belong  =0x5f4d544e  (Monotone source repository)
> >68 belong  =0x0f055112  (Fossil checkout)
> ...
> >68 belong  =0x47504b47  (OGC GeoPackage file)
> >68 default x
> >>68 belong  !0          \b, application id %u
> >>60 belong  !0          \b, user version %d
> 
> The mentioned magic template magic.txt on sqlite.org list three more
> sub types. These are now done by additional 3 lines like:
> >>>68 belong =0x47503130 (OGC GeoPackage version 1.0 file)
> >>>68 belong =0x45737269 (Esri Spatially-Enabled Database)
> >>>68 belong =0x4d504258 (MBTiles tileset)
> 
> For all my Maple workbook examples like SystemID.maple the
> application id is decimal 1558055407 or 5CDE09EF
> hexadecimal. So show this subtype by lines like
> >>>68 belong =0x5CDE09EF (Maple Workbook)
> >>>68 default x
> >>>>68 belong !0         \b, application id %u
> >>60 belong !0          \b, user version %d
> All not known applications are shown by default clause and the user
> id is now always shown if not zero.
> 
> Depending on sub class specific file name extensions are used. For
> Maple workbook file name extension is "maple" instead of extension
> like "sqlite3". This is now down by line like:
> >>68 belong !0x5CDE09EF	database
> !:ext sqlite/sqlite3/db/db3/dbe/sdb/help
> >>68 belong =0x5CDE09EF  database
> !:ext maple
> 
> Furthermore "db3" is used in examples like AddrBook.db3 and
> cgipcrvp.db3. The extension help is used for newer Maple help
> database and the SQLite database weewx.sdb used by weather software
> weewx.
> 
> After applying the above mentioned modifications by patch
> file-5.40-sql-maple.diff then all database examples are still
> are recognized, described with more details and some
> misidentifications vanished like:
> 
> AddrBook.db3:                     SQLite 3.x database,
> 				  last written using SQLite
> 				  version 3007005
> 				  , page size 1024
> 				  , file counter 2081
> 				  , database pages 100
> 				  , cookie 0x15
> 				  , schema 1
> 				  , UTF-16 little endian
> 				  , version-valid-for 2081
> cgipcrvp.db3:                     SQLite 3.x database,
> 				  last written using SQLite
> 				  version 0
> 				  , page size 1024
> 				  , file counter 25008
> 				  , database pages 0
> 				  , cookie 0x18d
> 				  , schema 4
> 				  , cache page size 2000
> 				  , UTF-8
> 				  , version-valid-for 0
> corruptDB.sqlite:                 SQLite 3.x database,
> 				  user version 256,
> 				  last written using SQLite
> 				  version 0
> 				  , page size 24929
> 				  , writer version 0
> 				  , read version 4
> 				  , maximum payload 1
> 				  , minimum payload 1
> 				  , leaf payload 0
> 				  , file counter 1075847168
> 				  , database pages 2109952
> 				  , free pages 1024
> 				  , cookie 0x1f00
> 				  , schema 3328
> 				  , cache page size 1024
> 				  , unknown 0x0 encoding
> 				  , vacuum mode 2048
> 				  , version-valid-for 0
> fmt-729-signature-id-1053.sqlite: data
> maple.help:                       SQLite 3.x database,
> 				  last written using SQLite
> 				  version 3027002
> 				  , file counter 35509
> 				  , database pages 210163
> 				  , cookie 0x35
> 				  , schema 4
> 				  , UTF-8
> 				  , version-valid-for 35509
> storage.sqlite:                   SQLite 3.x database,
> 				  user version 65536,
> 				  last written using SQLite
> 				  version 3014001
> 				  , page size 512
> 				  , file counter 1
> 				  , database pages 1
> 				  , cookie 0x0
> 				  , schema 0
> 				  , unknown 0x0 encoding
> 				  , version-valid-for 1
> SystemID.maple:                   SQLite 3.x database
> 				  (Maple Workbook),
> 				  user version 15,
> 				  last written using SQLite
> 				  version 3013000
> 				  , file counter 395
> 				  , database pages 269
> 				  , cookie 0xa4
> 				  , schema 4
> 				  , UTF-8
> 				  , version-valid-for 395
> weewx.sdb:                        SQLite 3.x database,
> 				  last written using SQLite
> 				  version 3027002
> 				  , file counter 10
> 				  , database pages 240
> 				  , cookie 0x72
> 				  , schema 4
> 				  , UTF-8
> 				  , version-valid-for 10
> 
> 
> I hope my diff file can be applied in future version of file utility.
> 
> With best wishes
> Jörg Jenderek
> --
> Jörg Jenderek
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> <file-5_40-sql-maple_diff.DEFANGED-1439><file-5_40-sql-maple_diff_sig.DEFANGED-1440><sqlite-trid-v.txt.gz>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20210730/8141fb08/attachment.asc>


More information about the File mailing list