[File] [PATCH] Magdir/sql SQLite 3.x database; *.maple *.help *.db3 *.sdb
Christos Zoulas
christos at zoulas.com
Fri Jul 30 15:00:05 UTC 2021
Committed, thanks!
christos
> On Jul 23, 2021, at 11:01 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
>
> Hello,
>
> some days ago i inspected some Maple examples with file name
> extension maple and help file name extension.
>
> When running running file command version 5.40 on such examples and
> related files i get an output like:
>
> AddrBook.db3: SQLite 3.x database,
> last written using SQLite
> version 3007005
> cgipcrvp.db3: SQLite 3.x database,
> last written using SQLite
> version 0
> corruptDB.sqlite: SQLite 3.x database,
> user version 256,
> last written using SQLite
> version 0
> fmt-729-signature-id-1053.sqlite: SQLite 3.x database
> maple.help: SQLite 3.x database,
> last written using SQLite
> version 3027002
> storage.sqlite: SQLite 3.x database,
> user version 65536,
> last written using SQLite
> version 3014001
> SystemID.maple: SQLite 3.x database,
> application id 1558055407,
> user version 15,
> last written using SQLite
> version 3013000
> weewx.sdb: SQLite 3.x database,
> last written using SQLite
> version 3027002
>
>
> With --extension only sqlite/sqlite3/db/dbe is displayed and with -i
> option deprecated application/x-sqlite3 is shown for examples.
>
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html).
> All examples are described by TrID as "SQLite 3.x database" by
> sqlite-3x.trid.xml. (See appended sqlite-trid-v.txt.gz ). It also
> displays related URL.
>
> For the examples a page about SQLite on Wikipedia was mentioned by
> TrID as related URL. On that page a Database File Format
> documentation is mentioned. So this information is now expressed by
> additional comment lines like:
> # URL: https://en.wikipedia.org/wiki/SQLite
> # Reference: https://www.sqlite.org/fileformat.html
>
> The detection of examples happens by lines inside Magdir/sql like:
>
> 0 string SQLite\ format\ 3 SQLite 3.x database
> !:mime application/x-sqlite3
> !:ext sqlite/sqlite3/db/dbe
>
> According to documentation the first 100 bytes are part of the
> database header. This contain some information.
> The file command only print a few values at the moment. The
> SQLITE_VERSION_NUMBER ( like: 0 3008011 3016002 3007014 3017000
> 3022000 3028000 3031001) is shown by line like
> >96 belong x \b, last written using SQLite version %d
>
> At offset 16 the page size is stored as 2 byte big endian integer.
> Must be a power of two between 512 and 32768. For my examples this
> was 512 1024 32768 and often 4096. So show now unusual values by line
> like:
> >>16 ubeshort !4096 \b, page size %u
> For example corruptDB.sqlite value was 24929 and for
> fmt-729-signature-id-1053.sqlite value was 0. First was an example
> for a corrupt database and is found inside Thunderbird sources.
> Second example is just a short 16 byte pattern for DROID file
> identifying tool. So i used that information to skip DROID example by
> additional second test by line like:
> >16 ubeshort >0 SQLite 3.x
> !:mime application/vnd.sqlite3
> Furthermore now the IANA official mime type is shown.
>
> At offset 17 the file format write version is stored as byte. For
> WAL this value is 2 and for legacy (non WAL) value is 1. Afterwards
> the same applies for read version byte. So shown this information by
> lines like:
> >>18 ubyte !1 \b, writer version %u
> >>19 ubyte !1 \b, read version %u
> For example corruptDB.sqlite here again invalidness becomes visible
> by values 0 and 4.
>
> At offset 21 the maximum embedded payload fraction is stored as byte
> value (Must be 64 or 40 hexadecimal)
> At offset 22 the minimum embedded payload fraction is stored as byte
> value (Must be 32 or 20 hexadecimal).
> At offset 23 the Leaf payload fraction is stored as byte value (Must
> be 32 or 20 hexadecimal). So show unexpected values by lines like:
> >>21 ubyte !64 \b, maximum payload %u
> >>22 ubyte !32 \b, minimum payload %u
> >>23 ubyte !32 \b, leaf payload %u
> Again for corrupt example here again invalidness becomes visible by
> values 1, 1 and 0.
>
> At offset 56 the database text encoding is stored as 4 byte big
> endian integer. A value of 1 means UTF-8. A value of 2 means
> UTF-16le. A value of 3 means UTF-16be. So show this information in
> human readable form by lines like:
> #>>56 ubelong x \b, encoding %u
> >>56 ubelong x
> >>>56 ubelong =1 \b, UTF-8
> >>>56 ubelong =2 \b, UTF-16 little endian
> >>>56 ubelong =3 \b, UTF-16 big endian
> >>>56 default x
> >>>>56 ubelong x \b, unknown 0x%x encoding
> Now for all users becomes visible that corrupt example is not a valid
> database by value 0. Surprisingly this value also occurs for examples
> storage.sqlite with file size 512 bytes ( This is shown by page size
> 512 and database pages 1). Such examples can be found in profile
> directory of TorBrowser, Firefox browser and Thunderbird.
>
> At offset 60 the "user version" is stored as 4 byte big endian
> integer. According to SQLite documentation at offset 68 the
> application id is stored as 4 byte big endian integer.
> These values can be used by utilities such as file(1) to determine
> the specific file type. At the moment only 8 (1+7) specific sub
> classes are described by these values by lines like:
>
> >60 belong =0x5f4d544e (Monotone source repository)
> >68 belong =0x0f055112 (Fossil checkout)
> ...
> >68 belong =0x47504b47 (OGC GeoPackage file)
> >68 default x
> >>68 belong !0 \b, application id %u
> >>60 belong !0 \b, user version %d
>
> The mentioned magic template magic.txt on sqlite.org list three more
> sub types. These are now done by additional 3 lines like:
> >>>68 belong =0x47503130 (OGC GeoPackage version 1.0 file)
> >>>68 belong =0x45737269 (Esri Spatially-Enabled Database)
> >>>68 belong =0x4d504258 (MBTiles tileset)
>
> For all my Maple workbook examples like SystemID.maple the
> application id is decimal 1558055407 or 5CDE09EF
> hexadecimal. So show this subtype by lines like
> >>>68 belong =0x5CDE09EF (Maple Workbook)
> >>>68 default x
> >>>>68 belong !0 \b, application id %u
> >>60 belong !0 \b, user version %d
> All not known applications are shown by default clause and the user
> id is now always shown if not zero.
>
> Depending on sub class specific file name extensions are used. For
> Maple workbook file name extension is "maple" instead of extension
> like "sqlite3". This is now down by line like:
> >>68 belong !0x5CDE09EF database
> !:ext sqlite/sqlite3/db/db3/dbe/sdb/help
> >>68 belong =0x5CDE09EF database
> !:ext maple
>
> Furthermore "db3" is used in examples like AddrBook.db3 and
> cgipcrvp.db3. The extension help is used for newer Maple help
> database and the SQLite database weewx.sdb used by weather software
> weewx.
>
> After applying the above mentioned modifications by patch
> file-5.40-sql-maple.diff then all database examples are still
> are recognized, described with more details and some
> misidentifications vanished like:
>
> AddrBook.db3: SQLite 3.x database,
> last written using SQLite
> version 3007005
> , page size 1024
> , file counter 2081
> , database pages 100
> , cookie 0x15
> , schema 1
> , UTF-16 little endian
> , version-valid-for 2081
> cgipcrvp.db3: SQLite 3.x database,
> last written using SQLite
> version 0
> , page size 1024
> , file counter 25008
> , database pages 0
> , cookie 0x18d
> , schema 4
> , cache page size 2000
> , UTF-8
> , version-valid-for 0
> corruptDB.sqlite: SQLite 3.x database,
> user version 256,
> last written using SQLite
> version 0
> , page size 24929
> , writer version 0
> , read version 4
> , maximum payload 1
> , minimum payload 1
> , leaf payload 0
> , file counter 1075847168
> , database pages 2109952
> , free pages 1024
> , cookie 0x1f00
> , schema 3328
> , cache page size 1024
> , unknown 0x0 encoding
> , vacuum mode 2048
> , version-valid-for 0
> fmt-729-signature-id-1053.sqlite: data
> maple.help: SQLite 3.x database,
> last written using SQLite
> version 3027002
> , file counter 35509
> , database pages 210163
> , cookie 0x35
> , schema 4
> , UTF-8
> , version-valid-for 35509
> storage.sqlite: SQLite 3.x database,
> user version 65536,
> last written using SQLite
> version 3014001
> , page size 512
> , file counter 1
> , database pages 1
> , cookie 0x0
> , schema 0
> , unknown 0x0 encoding
> , version-valid-for 1
> SystemID.maple: SQLite 3.x database
> (Maple Workbook),
> user version 15,
> last written using SQLite
> version 3013000
> , file counter 395
> , database pages 269
> , cookie 0xa4
> , schema 4
> , UTF-8
> , version-valid-for 395
> weewx.sdb: SQLite 3.x database,
> last written using SQLite
> version 3027002
> , file counter 10
> , database pages 240
> , cookie 0x72
> , schema 4
> , UTF-8
> , version-valid-for 10
>
>
> I hope my diff file can be applied in future version of file utility.
>
> With best wishes
> Jörg Jenderek
> --
> Jörg Jenderek
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> <file-5_40-sql-maple_diff.DEFANGED-1439><file-5_40-sql-maple_diff_sig.DEFANGED-1440><sqlite-trid-v.txt.gz>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20210730/8141fb08/attachment.asc>
More information about the File
mailing list