[File] [PATCH] Magdir/linux mlocate database recognized but not plocate variant

Christos Zoulas christos at zoulas.com
Thu Sep 21 16:11:56 UTC 2023


Committed, thanks!

christos

> On Sep 18, 2023, at 3:12 PM, Jörg Jenderek (GMX) <joerg.jen.der.ek at gmx.net> wrote:
> 
> Hello,
> 
> some weeks ago ago i handled some SQLite database samples. Often these
> have the file name suffix DB. So i looked for such samples on my
> systems. Unfortunately this suffix is also used for other database
> formats. In this session i will handle mlocate/plocate database. This is
> used by "standard" search utility locate on UNIX like systems. Typically
> the database is stored as /var/lib/mlocate/mlocate.db (on SUSE 13.2,
> Raspian/Debian 10) or /var/lib/plocate/plocate.db (Linux Mint 21.1). But
> by option parameters other database name and path could be used. The
> utility is for example described by page on Wikipedia like:
> 	  https://en.wikipedia.org/wiki/Locate_(Unix)
> 
> When i run file command version 5.45 on my locate database samples i get
> an output like:
> 
> mlocate.db:                           mlocate database
> 				      , version 0,
> 				      require visibility,
> 				      root /
> mylocate-U_tmp.db:                    mlocate database
> 				      , version 0,
> 				      root /tmp/mlocate
> mylocate-prune-bind-mounts-yes.db:    mlocate database
> 				      , version 0,
> 				      root /tmp/mlocate
> mylocate-prunefs-nfs.db:              mlocate database
> 				      , version 0,
> 				      root /tmp/mlocate
> plocate-U_tmp-l0.db:                  data
> plocate-U_tmp-prune-bind-mounts-0.db: data
> plocate.db:                           data
> 
> With --extension option only ??? is displayed for my samples.
> Furthermore with option -i for inspected samples only generic
> application/octet-stream is shown.
> 
> For comparison reason i also run the file format identification utility
> DROID  (See https://sourceforge.net/projects/droid/). Here the examples
> are described wrong as "Thumbs DB file" with version "XP" and mime type
> application/vnd.microsoft.windows.thumbnail-cache by PUID fmt/682
> because of using DB file name extension.
> 
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). This identifies also
> all mlocate examples as "mlocate database" by db-mlocate.trid.xml.
> The plocate samples at the moment are not detected and are described as
> "Unknown!" (See appended trid-v-db-locate.txt.gz).
> 
> This tool list the used file name extension and with -v option the
> related URL pointing to used software web page.
> 
> All detected mlocate samples are described by Magdir/linux starting with
> lines like:
> 0		string		\0mlocate	mlocate database
> >12		byte		x		\b, version %d
> 
> After the identifying by the 8 byte starting pattern i show an user
> defined mime type instead of generic application/octet-stream and show
> extension. The default name is mlocate.db located inside
> /var/lib/mlocate if not overridden with --output option of updatedb. At
> the moment version value is 0; a higher version will probably not occur,
> because mlocate is now often replaced by plocate. So this information is
> not so interesting for the normal user because is is "always" 0. So this
> now becomes like:
> 0		string		\0mlocate	mlocate database
> !:mime	application/x-mlocate
> !:ext	db
> >12		byte		!0		\b, version %d
> 
> Afterwards show visibility like before, that can be configured with -l
> option of updatedb. So keep line like:
> >13		byte		1		\b, require visibility
> 
> Afterwards come 2 byte pad for 32-bit total alignment. These seem to be
> always nil. For control reason this can be checked by line like:
> #>14		short		!0		\b, padding %#x
> 
> Last information shown was root. Standard is 1 byte / if not overridden
> with --database-root option of updatedb. So i keep that line like:
> >16		string		x		\b, root %s
> 
> After the nil terminated root value the configuration blocks starts. So
> show first variable name that is nil terminated and it my examples it is
> prune_bind_mounts. So now show this information by additional line:
> >>&1		string		x		\b, 1st variable %s
> Afterwards show first variable value. For prune_bind_mounts i get 1 byte
> string 0 or 1. That is shown by additional line like:
> >>>&1		string		x		\b=%s
> 
> The configuration block contains variable-key combination. These are
> given by default stored inside /etc/updatedb.conf. So these some hundred
> bytes look as text like:
> PRUNEFS=afs anon_inodefs auto autofs ... udf usbfs vboxsf vperfctrfs
> PRUNEPATHS=/tmp /var/tmp ... /var/run/media
> PRUNENAMES=.git .hg .svn CVS
> PRUNE_BIND_MOUNTS=no
> 
> So the the size of the configuration was in examples in range from 82 -
> 600. So show the configuration block size in big endian by additional
> last line like:
> >8		ubelong		x	\b, configuration size %u
> 
> After handling some mlocate database typically with name mlocate.db i
> looked for such standard search database samples on my Linux Mint 21.1
> system. At first glance surprisingly the locate utility mlocate is
> replaced by plocate because it is faster and the database is smaller
> according to own documentation. So it is here the standard locate
> utility instead of mlocate.
> 
> The calling of this program is described in Linux User Manual
> plocate(1). You can find this on the web for example. Luckily plocate is
> open source. So with the help of the header file db.h i tried to
> understand and create magic patterns. That information is expressed
> after mlocate variant via comment lines like:
> # URL:		https://plocate.sesse.net/
> # Reference:	https://plocate.sesse.net/download/plocate-1.1.19.tar.gz
> #		plocate-1.1.19/db.h
> 
> According to that the samples starts with 8 byte magic \0plocate. At
> offset 8 the version is stored as uint32_t. In my examples the version
> was 1. So these 2 facts were expressed inside Magdir/linux by lines like:
> 0		string		\0plocate	plocate database
> !:mime		application/x-plocate
> !:ext		db
> >8		ulelong    	!1		\b, version %u
> 
> The num_docids is 0 for "empty" samples and a132h in my real example
> plocate.db. So this information is shown by lines like:
> >20		ulelong    	>0		\b, num_docids %u
> 
> For version 1 and up zstd dictionary length in bytes is stored at offset
> 44 as 4 byte integer and dictionary offset in bytes is stored as 8 byte
> integer. For empty examples these value are nil. So for "real" examples
> jump first to beginning of zstd dictionary. Then jump from there
> dictionary length bytes - 8 ( for quad length) to ZST data beginning.
> Then print 1 space char after zstd_dictionary_offset and then handle
> Zstandard compressed data by Magdir/compress by indirect directive to
> get phrase like "at 0x400+0x70 Zstandard compressed data (v0.8+)". So
> this is done by lines like:
> >8			ulelong    	>0
> >>44			ulelong    	!0		\b, at %#x
> >>48			ulequad    	>0		\b+%#llx
> >>>(48.q)		ubequad    	x
> >>>>&(44.l-8)		ulelong    	x
> #>>>>&(44.l-8)		ulelong    	x		ZST=%8.8x
> >>>>>&-4		indirect	x		\b
> 
> If max_version is greater or equal two then more information is stored
> in header. That informations are only relevant for updatedb.
> The configuration block length in bytes (in my samples range 65-543) is
> stored as 4 byte integer at offset 88. This block now can occur at
> different offsets. That value is stored at 8 byte integer at offset 96.
> So jump to that value and here also show first nil terminated variable
> name. Here again in samples this was prune_bind_mounts. Afterwards show
> nil terminated variable value. Again in my samples i get her 1 byte
> string 0 or 1. So this is done by lines like:
> >40		ulelong    	>1
> >>88		ulequad    	x	\b, configuration size %llu
> >>96		ulequad    	>0	\b, at %#llx 1st variable
> >>>(96.q)	string    	x	%s
> >>>>&1		string		x	\b=%s
> 
> Here also show value of bool check_visibility. This is 0 or 1 configured
> with -l option of updatedb.plocate. This is shown by line like:
> >>104		ubyte    	1		\b, require visibility
> 
> After applying the above mentioned modifications by patch
> file-5.45-linux-locate.diff and using Magdir/compress then all my
> inspected locate examples are now described and with more details. This
> now looks like:
> 
> mlocate.db:                           mlocate database,
> 				      require visibility,
> 				      root /, 1st variable
> 				      prune_bind_mounts=0
> 				      , configuration size 556
> mylocate-U_tmp.db:                    mlocate database,
> 				      root /tmp/mlocate, 1st variable
> 				      prune_bind_mounts=0
> 				      , configuration size 600
> mylocate-prune-bind-mounts-yes.db:    mlocate database,
> 				      root /tmp/mlocate, 1st variable
> 				      prune_bind_mounts=1
> 				      , configuration size 600
> mylocate-prunefs-nfs.db:              mlocate database,
> 				      root /tmp/mlocate, 1st variable
> 				      prune_bind_mounts=0
> 				      , configuration size 185
> plocate-U_tmp-l0.db:                  plocate database
> 				      , configuration size 537
> 				      , at 0x1a1 1st variable
> 				      prune_bind_mounts=1
> plocate-U_tmp-prune-bind-mounts-0.db: plocate database
> 				      , configuration size 537
> 				      , at 0x1a1 1st variable
> 				      prune_bind_mounts=0
> 				      , require visibility
> plocate.db:                           plocate database
> 				      , num_docids 41284
> 				      , at 0x400+0x70
> 				      Zstandard compressed data (v0.8+)
> 				      , Dictionary ID: 1499361103
> 				      , next zstd dictionary length
> 				      0x400 offset 0x14b9cb8
> 				      , configuration size 537
> 				      , at 0x14ba0b8 1st variable
> 				      , require visibility
> 
> 
> I hope my diff file can be applied in future version of file
> utility.
> 
> With best wishes,
> Jörg Jenderek
> --
> Jörg Jenderek
> <trid-v-db-locate.txt.gz><file-5_45-linux-locate_diff.DEFANGED-70059><file-5_45-linux-locate_diff_sig.DEFANGED-70060>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20230921/522b95a3/attachment.asc>


More information about the File mailing list