[File] [PATCH] Magdir/linux mlocate database recognized but not plocate variant
Christos Zoulas
christos at zoulas.com
Thu Sep 21 16:11:56 UTC 2023
Committed, thanks!
christos
> On Sep 18, 2023, at 3:12 PM, Jörg Jenderek (GMX) <joerg.jen.der.ek at gmx.net> wrote:
>
> Hello,
>
> some weeks ago ago i handled some SQLite database samples. Often these
> have the file name suffix DB. So i looked for such samples on my
> systems. Unfortunately this suffix is also used for other database
> formats. In this session i will handle mlocate/plocate database. This is
> used by "standard" search utility locate on UNIX like systems. Typically
> the database is stored as /var/lib/mlocate/mlocate.db (on SUSE 13.2,
> Raspian/Debian 10) or /var/lib/plocate/plocate.db (Linux Mint 21.1). But
> by option parameters other database name and path could be used. The
> utility is for example described by page on Wikipedia like:
> https://en.wikipedia.org/wiki/Locate_(Unix)
>
> When i run file command version 5.45 on my locate database samples i get
> an output like:
>
> mlocate.db: mlocate database
> , version 0,
> require visibility,
> root /
> mylocate-U_tmp.db: mlocate database
> , version 0,
> root /tmp/mlocate
> mylocate-prune-bind-mounts-yes.db: mlocate database
> , version 0,
> root /tmp/mlocate
> mylocate-prunefs-nfs.db: mlocate database
> , version 0,
> root /tmp/mlocate
> plocate-U_tmp-l0.db: data
> plocate-U_tmp-prune-bind-mounts-0.db: data
> plocate.db: data
>
> With --extension option only ??? is displayed for my samples.
> Furthermore with option -i for inspected samples only generic
> application/octet-stream is shown.
>
> For comparison reason i also run the file format identification utility
> DROID (See https://sourceforge.net/projects/droid/). Here the examples
> are described wrong as "Thumbs DB file" with version "XP" and mime type
> application/vnd.microsoft.windows.thumbnail-cache by PUID fmt/682
> because of using DB file name extension.
>
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). This identifies also
> all mlocate examples as "mlocate database" by db-mlocate.trid.xml.
> The plocate samples at the moment are not detected and are described as
> "Unknown!" (See appended trid-v-db-locate.txt.gz).
>
> This tool list the used file name extension and with -v option the
> related URL pointing to used software web page.
>
> All detected mlocate samples are described by Magdir/linux starting with
> lines like:
> 0 string \0mlocate mlocate database
> >12 byte x \b, version %d
>
> After the identifying by the 8 byte starting pattern i show an user
> defined mime type instead of generic application/octet-stream and show
> extension. The default name is mlocate.db located inside
> /var/lib/mlocate if not overridden with --output option of updatedb. At
> the moment version value is 0; a higher version will probably not occur,
> because mlocate is now often replaced by plocate. So this information is
> not so interesting for the normal user because is is "always" 0. So this
> now becomes like:
> 0 string \0mlocate mlocate database
> !:mime application/x-mlocate
> !:ext db
> >12 byte !0 \b, version %d
>
> Afterwards show visibility like before, that can be configured with -l
> option of updatedb. So keep line like:
> >13 byte 1 \b, require visibility
>
> Afterwards come 2 byte pad for 32-bit total alignment. These seem to be
> always nil. For control reason this can be checked by line like:
> #>14 short !0 \b, padding %#x
>
> Last information shown was root. Standard is 1 byte / if not overridden
> with --database-root option of updatedb. So i keep that line like:
> >16 string x \b, root %s
>
> After the nil terminated root value the configuration blocks starts. So
> show first variable name that is nil terminated and it my examples it is
> prune_bind_mounts. So now show this information by additional line:
> >>&1 string x \b, 1st variable %s
> Afterwards show first variable value. For prune_bind_mounts i get 1 byte
> string 0 or 1. That is shown by additional line like:
> >>>&1 string x \b=%s
>
> The configuration block contains variable-key combination. These are
> given by default stored inside /etc/updatedb.conf. So these some hundred
> bytes look as text like:
> PRUNEFS=afs anon_inodefs auto autofs ... udf usbfs vboxsf vperfctrfs
> PRUNEPATHS=/tmp /var/tmp ... /var/run/media
> PRUNENAMES=.git .hg .svn CVS
> PRUNE_BIND_MOUNTS=no
>
> So the the size of the configuration was in examples in range from 82 -
> 600. So show the configuration block size in big endian by additional
> last line like:
> >8 ubelong x \b, configuration size %u
>
> After handling some mlocate database typically with name mlocate.db i
> looked for such standard search database samples on my Linux Mint 21.1
> system. At first glance surprisingly the locate utility mlocate is
> replaced by plocate because it is faster and the database is smaller
> according to own documentation. So it is here the standard locate
> utility instead of mlocate.
>
> The calling of this program is described in Linux User Manual
> plocate(1). You can find this on the web for example. Luckily plocate is
> open source. So with the help of the header file db.h i tried to
> understand and create magic patterns. That information is expressed
> after mlocate variant via comment lines like:
> # URL: https://plocate.sesse.net/
> # Reference: https://plocate.sesse.net/download/plocate-1.1.19.tar.gz
> # plocate-1.1.19/db.h
>
> According to that the samples starts with 8 byte magic \0plocate. At
> offset 8 the version is stored as uint32_t. In my examples the version
> was 1. So these 2 facts were expressed inside Magdir/linux by lines like:
> 0 string \0plocate plocate database
> !:mime application/x-plocate
> !:ext db
> >8 ulelong !1 \b, version %u
>
> The num_docids is 0 for "empty" samples and a132h in my real example
> plocate.db. So this information is shown by lines like:
> >20 ulelong >0 \b, num_docids %u
>
> For version 1 and up zstd dictionary length in bytes is stored at offset
> 44 as 4 byte integer and dictionary offset in bytes is stored as 8 byte
> integer. For empty examples these value are nil. So for "real" examples
> jump first to beginning of zstd dictionary. Then jump from there
> dictionary length bytes - 8 ( for quad length) to ZST data beginning.
> Then print 1 space char after zstd_dictionary_offset and then handle
> Zstandard compressed data by Magdir/compress by indirect directive to
> get phrase like "at 0x400+0x70 Zstandard compressed data (v0.8+)". So
> this is done by lines like:
> >8 ulelong >0
> >>44 ulelong !0 \b, at %#x
> >>48 ulequad >0 \b+%#llx
> >>>(48.q) ubequad x
> >>>>&(44.l-8) ulelong x
> #>>>>&(44.l-8) ulelong x ZST=%8.8x
> >>>>>&-4 indirect x \b
>
> If max_version is greater or equal two then more information is stored
> in header. That informations are only relevant for updatedb.
> The configuration block length in bytes (in my samples range 65-543) is
> stored as 4 byte integer at offset 88. This block now can occur at
> different offsets. That value is stored at 8 byte integer at offset 96.
> So jump to that value and here also show first nil terminated variable
> name. Here again in samples this was prune_bind_mounts. Afterwards show
> nil terminated variable value. Again in my samples i get her 1 byte
> string 0 or 1. So this is done by lines like:
> >40 ulelong >1
> >>88 ulequad x \b, configuration size %llu
> >>96 ulequad >0 \b, at %#llx 1st variable
> >>>(96.q) string x %s
> >>>>&1 string x \b=%s
>
> Here also show value of bool check_visibility. This is 0 or 1 configured
> with -l option of updatedb.plocate. This is shown by line like:
> >>104 ubyte 1 \b, require visibility
>
> After applying the above mentioned modifications by patch
> file-5.45-linux-locate.diff and using Magdir/compress then all my
> inspected locate examples are now described and with more details. This
> now looks like:
>
> mlocate.db: mlocate database,
> require visibility,
> root /, 1st variable
> prune_bind_mounts=0
> , configuration size 556
> mylocate-U_tmp.db: mlocate database,
> root /tmp/mlocate, 1st variable
> prune_bind_mounts=0
> , configuration size 600
> mylocate-prune-bind-mounts-yes.db: mlocate database,
> root /tmp/mlocate, 1st variable
> prune_bind_mounts=1
> , configuration size 600
> mylocate-prunefs-nfs.db: mlocate database,
> root /tmp/mlocate, 1st variable
> prune_bind_mounts=0
> , configuration size 185
> plocate-U_tmp-l0.db: plocate database
> , configuration size 537
> , at 0x1a1 1st variable
> prune_bind_mounts=1
> plocate-U_tmp-prune-bind-mounts-0.db: plocate database
> , configuration size 537
> , at 0x1a1 1st variable
> prune_bind_mounts=0
> , require visibility
> plocate.db: plocate database
> , num_docids 41284
> , at 0x400+0x70
> Zstandard compressed data (v0.8+)
> , Dictionary ID: 1499361103
> , next zstd dictionary length
> 0x400 offset 0x14b9cb8
> , configuration size 537
> , at 0x14ba0b8 1st variable
> , require visibility
>
>
> I hope my diff file can be applied in future version of file
> utility.
>
> With best wishes,
> Jörg Jenderek
> --
> Jörg Jenderek
> <trid-v-db-locate.txt.gz><file-5_45-linux-locate_diff.DEFANGED-70059><file-5_45-linux-locate_diff_sig.DEFANGED-70060>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20230921/522b95a3/attachment.asc>
More information about the File
mailing list