[File] [PATCH] Magdir/linux mlocate database recognized but not plocate variant

Jörg Jenderek (GMX) joerg.jen.der.ek at gmx.net
Mon Sep 18 19:12:58 UTC 2023


Hello,

some weeks ago ago i handled some SQLite database samples. Often these
have the file name suffix DB. So i looked for such samples on my
systems. Unfortunately this suffix is also used for other database
formats. In this session i will handle mlocate/plocate database. This is
used by "standard" search utility locate on UNIX like systems. Typically
the database is stored as /var/lib/mlocate/mlocate.db (on SUSE 13.2,
Raspian/Debian 10) or /var/lib/plocate/plocate.db (Linux Mint 21.1). But
by option parameters other database name and path could be used. The
utility is for example described by page on Wikipedia like:
	  https://en.wikipedia.org/wiki/Locate_(Unix)

When i run file command version 5.45 on my locate database samples i get
an output like:

mlocate.db:                           mlocate database
				      , version 0,
				      require visibility,
				      root /
mylocate-U_tmp.db:                    mlocate database
				      , version 0,
				      root /tmp/mlocate
mylocate-prune-bind-mounts-yes.db:    mlocate database
				      , version 0,
				      root /tmp/mlocate
mylocate-prunefs-nfs.db:              mlocate database
				      , version 0,
				      root /tmp/mlocate
plocate-U_tmp-l0.db:                  data
plocate-U_tmp-prune-bind-mounts-0.db: data
plocate.db:                           data

With --extension option only ??? is displayed for my samples.
Furthermore with option -i for inspected samples only generic
application/octet-stream is shown.

For comparison reason i also run the file format identification utility
DROID  (See https://sourceforge.net/projects/droid/). Here the examples
are described wrong as "Thumbs DB file" with version "XP" and mime type
application/vnd.microsoft.windows.thumbnail-cache by PUID fmt/682
because of using DB file name extension.

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). This identifies also
all mlocate examples as "mlocate database" by db-mlocate.trid.xml.
The plocate samples at the moment are not detected and are described as
"Unknown!" (See appended trid-v-db-locate.txt.gz).

This tool list the used file name extension and with -v option the
related URL pointing to used software web page.

All detected mlocate samples are described by Magdir/linux starting with
lines like:
  0		string		\0mlocate	mlocate database
  >12		byte		x		\b, version %d

After the identifying by the 8 byte starting pattern i show an user
defined mime type instead of generic application/octet-stream and show
extension. The default name is mlocate.db located inside
/var/lib/mlocate if not overridden with --output option of updatedb. At
the moment version value is 0; a higher version will probably not occur,
because mlocate is now often replaced by plocate. So this information is
not so interesting for the normal user because is is "always" 0. So this
now becomes like:
  0		string		\0mlocate	mlocate database
  !:mime	application/x-mlocate
  !:ext	db
  >12		byte		!0		\b, version %d

Afterwards show visibility like before, that can be configured with -l
option of updatedb. So keep line like:
  >13		byte		1		\b, require visibility

Afterwards come 2 byte pad for 32-bit total alignment. These seem to be
always nil. For control reason this can be checked by line like:
  #>14		short		!0		\b, padding %#x

Last information shown was root. Standard is 1 byte / if not overridden
with --database-root option of updatedb. So i keep that line like:
  >16		string		x		\b, root %s

After the nil terminated root value the configuration blocks starts. So
show first variable name that is nil terminated and it my examples it is
prune_bind_mounts. So now show this information by additional line:
  >>&1		string		x		\b, 1st variable %s
Afterwards show first variable value. For prune_bind_mounts i get 1 byte
string 0 or 1. That is shown by additional line like:
  >>>&1		string		x		\b=%s

The configuration block contains variable-key combination. These are
given by default stored inside /etc/updatedb.conf. So these some hundred
bytes look as text like:
PRUNEFS=afs anon_inodefs auto autofs ... udf usbfs vboxsf vperfctrfs
PRUNEPATHS=/tmp /var/tmp ... /var/run/media
PRUNENAMES=.git .hg .svn CVS
PRUNE_BIND_MOUNTS=no

So the the size of the configuration was in examples in range from 82 -
600. So show the configuration block size in big endian by additional
last line like:
  >8		ubelong		x	\b, configuration size %u

After handling some mlocate database typically with name mlocate.db i
looked for such standard search database samples on my Linux Mint 21.1
system. At first glance surprisingly the locate utility mlocate is
replaced by plocate because it is faster and the database is smaller
according to own documentation. So it is here the standard locate
utility instead of mlocate.

The calling of this program is described in Linux User Manual
plocate(1). You can find this on the web for example. Luckily plocate is
open source. So with the help of the header file db.h i tried to
understand and create magic patterns. That information is expressed
after mlocate variant via comment lines like:
# URL:		https://plocate.sesse.net/
# Reference:	https://plocate.sesse.net/download/plocate-1.1.19.tar.gz
#		plocate-1.1.19/db.h

According to that the samples starts with 8 byte magic \0plocate. At
offset 8 the version is stored as uint32_t. In my examples the version
was 1. So these 2 facts were expressed inside Magdir/linux by lines like:
  0		string		\0plocate	plocate database
  !:mime		application/x-plocate
  !:ext		db
  >8		ulelong    	!1		\b, version %u

The num_docids is 0 for "empty" samples and a132h in my real example
plocate.db. So this information is shown by lines like:
  >20		ulelong    	>0		\b, num_docids %u

For version 1 and up zstd dictionary length in bytes is stored at offset
44 as 4 byte integer and dictionary offset in bytes is stored as 8 byte
integer. For empty examples these value are nil. So for "real" examples
jump first to beginning of zstd dictionary. Then jump from there
dictionary length bytes - 8 ( for quad length) to ZST data beginning.
Then print 1 space char after zstd_dictionary_offset and then handle
Zstandard compressed data by Magdir/compress by indirect directive to
get phrase like "at 0x400+0x70 Zstandard compressed data (v0.8+)". So
this is done by lines like:
  >8			ulelong    	>0
  >>44			ulelong    	!0		\b, at %#x
  >>48			ulequad    	>0		\b+%#llx
  >>>(48.q)		ubequad    	x
  >>>>&(44.l-8)		ulelong    	x
  #>>>>&(44.l-8)		ulelong    	x		ZST=%8.8x
  >>>>>&-4		indirect	x		\b

If max_version is greater or equal two then more information is stored
in header. That informations are only relevant for updatedb.
The configuration block length in bytes (in my samples range 65-543) is
stored as 4 byte integer at offset 88. This block now can occur at
different offsets. That value is stored at 8 byte integer at offset 96.
So jump to that value and here also show first nil terminated variable
name. Here again in samples this was prune_bind_mounts. Afterwards show
nil terminated variable value. Again in my samples i get her 1 byte
string 0 or 1. So this is done by lines like:
  >40		ulelong    	>1
  >>88		ulequad    	x	\b, configuration size %llu
  >>96		ulequad    	>0	\b, at %#llx 1st variable
  >>>(96.q)	string    	x	%s
  >>>>&1		string		x	\b=%s

Here also show value of bool check_visibility. This is 0 or 1 configured
with -l option of updatedb.plocate. This is shown by line like:
  >>104		ubyte    	1		\b, require visibility

After applying the above mentioned modifications by patch
file-5.45-linux-locate.diff and using Magdir/compress then all my
inspected locate examples are now described and with more details. This
now looks like:

mlocate.db:                           mlocate database,
				      require visibility,
				      root /, 1st variable
				      prune_bind_mounts=0
				      , configuration size 556
mylocate-U_tmp.db:                    mlocate database,
				      root /tmp/mlocate, 1st variable
				      prune_bind_mounts=0
				      , configuration size 600
mylocate-prune-bind-mounts-yes.db:    mlocate database,
				      root /tmp/mlocate, 1st variable
				      prune_bind_mounts=1
				      , configuration size 600
mylocate-prunefs-nfs.db:              mlocate database,
				      root /tmp/mlocate, 1st variable
				      prune_bind_mounts=0
				      , configuration size 185
plocate-U_tmp-l0.db:                  plocate database
				      , configuration size 537
				      , at 0x1a1 1st variable
				      prune_bind_mounts=1
plocate-U_tmp-prune-bind-mounts-0.db: plocate database
				      , configuration size 537
				      , at 0x1a1 1st variable
				      prune_bind_mounts=0
				      , require visibility
plocate.db:                           plocate database
				      , num_docids 41284
				      , at 0x400+0x70
				      Zstandard compressed data (v0.8+)
				      , Dictionary ID: 1499361103
				      , next zstd dictionary length
				      0x400 offset 0x14b9cb8
				      , configuration size 537
				      , at 0x14ba0b8 1st variable
				      , require visibility


I hope my diff file can be applied in future version of file
utility.

With best wishes,
Jörg Jenderek
--
Jörg Jenderek
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-db-locate.txt.gz
Type: application/x-gzip
Size: 403 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230918/0bfc7c4f/attachment.bin>
-------------- next part --------------
--- file-5.45/magic/Magdir/linux.old	2023-07-17 17:54:36.000000000 +0200
+++ file-5.45/magic/Magdir/linux	2023-09-17 21:43:56.665090700 +0200
@@ -543,8 +543,86 @@
 # Type: mlocate database file
+# URL:		https://en.wikipedia.org/wiki/Locate_(Unix)
 # URL:  https://fedorahosted.org/mlocate/
 # From: Wander Nauta <info at wandernauta.nl>
+# Update:	Joerg Jenderek
 0		string		\0mlocate	mlocate database
->12		byte		x		\b, version %d
+#!:mime	application/octet-stream
+!:mime	application/x-mlocate
+# default mlocate.db if not overriden with --output option of updatedb
+!:ext	db
+# at the moment value is 0; a higher version will probably not occur, because mlocate is now often replaced by plocate
+>12		byte		!0		\b, version %d
+# configured with -l option of updatedb
 >13		byte		1		\b, require visibility
+# 2 byte pad for 32-bit total alignment 
+#>14		short		!0		\b, padding %#x
+# standard is 1 byte / if not overriden with --database-root option of updatedb
 >16		string		x		\b, root %s
+# 1st variable name nil terminated like: prune_bind_mounts
+>>&1		string		x		\b, 1st variable %s
+# 1st variable value like: 0 1
+>>>&1		string		x		\b=%s
+# configuration block size in big endian like: 82 85 174 181 185 483 491 496 497 556 600 
+>8		ubelong		x		\b, configuration size %u
+
+# URL:		https://plocate.sesse.net/
+# Reference:	https://plocate.sesse.net/download/plocate-1.1.19.tar.gz
+#		plocate-1.1.19/db.h
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/d/db-plocate.trid.xml
+# Note:		called "plocate database" by TrID
+# magic[8]
+0		string		\0plocate	plocate database
+#!:mime		application/octet-stream
+!:mime		application/x-plocate
+# default /var/lib/plocate/plocate.db if not overriden with --output option of updatedb.plocate 
+!:ext		db
+# version; 2 is the current version
+>8		ulelong    	!1		\b, version %u
+# hashtable_size; like 1 (for "empty" samples) 1b5c3h
+#>12		ulelong    	>1		\b, hash table size %#x
+# extra_ht_slots; like: 10h
+>16		ulelong    	!0x10		\b, extra_ht_slots %#x
+# num_docids; like 0 (for "empty" samples) a132h
+>20		ulelong    	>0		\b, num_docids %u
+# hash_table_offset_bytes; 78h (for "empty" samples) afdf99h
+#>24		ulequad    	!0x78		\b, hash table offset %#llx
+# filename_index_offset_bytes; 70h (for "empty" samples) aad571h
+#>32		ulequad    	!0x70		\b, filename index offset %#llx
+# version 1 and up only
+>8		ulelong    	>0
+# max_version;  nominally 1 or 2 but can be increased if more features are added in a backward-compatible way
+>>40		ulelong    	!2		\b, max version %u
+# zstd_dictionary_length_bytes; 0 (for "empty" samples) 400h
+>>44		ulelong    	!0		\b, at %#x
+# zstd_dictionary_offset_bytes; 0 (for "empty" samples) 70h
+>>48		ulequad    	>0		\b+%#llx
+# jump to beginning of zstd dictionary
+>>>(48.q)		ubequad    	x
+# jump realative zstd dictionary length bytes - 8 (quad length) forward to ZST data beginning
+#>>>>&(44.l-8)		ubelong    	x		ZST=%8.8x
+>>>>&(44.l-8)		ubelong    	x
+# print 1 space char after zstd_dictionary_offset and then handles Zstandard compressed data by ./compress
+# to get phrase like "at 0x400+0x70 Zstandard compressed data (v0.8+)"
+>>>>>&-4		indirect	x		\b 
+# only if max_version >= 2 and only relevant for updatedb
+>40		ulelong    	>1
+# directory_data_length_byte
+#>>56		ulequad    	x		\b, directory data length %#llx
+# directory_data_offset_bytes;
+#>>64		ulequad    	x		offset %#llx
+# next_zstd_dictionary_length_bytes; 0 (for "empty" samples) 400h
+>>72		ulequad    	>0		\b, next zstd dictionary length %#llx
+# next_zstd_dictionary_offset_bytes; 0 (for "empty" samples) 14b9cb8h
+>>>80		ulequad    	>0		offset %#llx
+# conf_block_length_bytes like; 65 147 148 151 152 452 537 540 543 
+>>88		ulequad    	x		\b, configuration size %llu
+# conf_block_offset_bytes; 1a1h (for "empty" samples) 14ba0b8h
+>>96		ulequad    	>0		\b, at %#llx 1st variable
+# 1st variable name nil terminated like: prune_bind_mounts
+>>>(96.q)	string    	x		%s
+# 1st variable value nil terminated like: 0 1
+>>>>&1		string		x		\b=%s
+# bool check_visibility; 0 or 1 configured with -l option of updatedb.plocate
+>>104		ubyte    	1		\b, require visibility
+#>>104		ubyte    	x		\b, check_visibility %#x
 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.45-linux-locate.diff.sig
Type: application/octet-stream
Size: 1844 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230918/0bfc7c4f/attachment.obj>


More information about the File mailing list