[File] [PATCH] of Magdir/mozilla for Mozilla lz4 compressed data; extension jsonlz4, mozlz4

Jörg Jenderek joerg.jen.der.ek at gmx.net
Mon Nov 26 15:18:47 UTC 2018


some day ago i run a treemap utility on my disks. In Thunderbird and
Firefox user directories i found unknown files with name extension
"jsonlz4" and "mozlz4". Such examples are described by the file command
version 5.35 only as "data" or "Unicode text, with very long lines".

The File Identifier TrID ( see http://mark0.net/soft-trid-e.html ) on
the other hand describes such examples often as "Mozilla mozLz4
compressed data (generic)" and some times also "Mozilla search engines

With verbose option -v this software show URL with information about
that file format as

But this URL was not a good starting point. So i choose web page about
lz4 compression as starting URL:
There a working uncompression utility named "dejsonlz4" was mentioned.
So i use URL to source "dejsonlz4.c" as reference.

According to that source file i add lines to Magdir/mozilla. First comes
a 8 byte magic string "mozLz40". This is now done by line like:
 0	string	mozLz40\0
Afterwards the size of the original uncompressed file is stored as
variable "decomp_size" inside 4 bytes. This information is now shown by
line like:
 >8	ulelong	x				\b, originally %u bytes
Afterward the lz4 compressed data is stored. For debugging purpose this
can be displayed by a line like:
 >12	ubequad	x				\b, lz4 data 0x%16.16llx

According to found information Mozilla use their own 12 byte
non-standard header. That is annoying, because standard utility lz4
program can not be used to unpack compressed files.
After uncompressing some examples with dejsonlz4 originally files seems
to be text files with JSON format. So with mime type "application/x-lz4"
for lz4 compressed file i choose now for Mozilla lz4 a user defined typ
by line
 !:mime	application/x-lz4+json

When bookmark are compressed by Mozilla the resulting archive seems to
have the file name "jsonlz4". When "search" and other "store" files are
compressed the file name extension "mozlz4" seems to be used for
compressed results. This is now expressed by line:
 !:ext	jsonlz4/mozlz4

Because not only bookmarks are compressed and the file format is also
used by Thunderbird i choose as identifying text the phrase "Mozilla lz4
compressed data".

After applying the above mentioned modifications by patch
file-5.35-mozilla-lz4.diff then all such compressed examples are
described by Magdir/mozilla like:

	Mozilla lz4 compressed data, originally 6441 bytes
	Mozilla lz4 compressed data, originally 56 bytes
	Mozilla lz4 compressed data, originally 134661 bytes

When looking inside Magdir/mozilla i saw a similar magic line
 0	string	mozLz4a		Mozilla lz4 compressed bookmark data
After searching in the net and looking in trid database i come to
conclusion that this magic "mozLz4a" maybe is a mis spelling or belong
to alpha version of Mozilla. But i am not sure about that item. So i add
this as a comment. So maybe another expert can check this.

I hope my diff file and suggestions can be applied in future version of
file utility.

With best wishes
Jörg Jenderek
Jörg Jenderek
-------------- next part --------------
--- file-5.35/magic/Magdir/mozilla.old	2018-01-17 12:08:36 +0000
+++ file-5.35/magic/Magdir/mozilla	2018-11-26 14:55:58 +0000
@@ -9,3 +9,18 @@
 0	string	XPCOM\nMozFASL\r\n\x1A		Mozilla XUL fastload data
+# Probably the next magic line contains misspelled "mozLz40\0"
 0	string	mozLz4a				Mozilla lz4 compressed bookmark data
+# From: Joerg Jenderek
+# URL: https://lz4.github.io/lz4/
+# Reference: https://github.com/avih/dejsonlz4/archive/master.zip/
+# dejsonlz4-master\src\dejsonlz4.c 
+# Note: mostly JSON compressed with a non-standard LZ4 header
+# can be unpacked by dejsonlz4 but not lz4 programm.
+0	string	mozLz40\0			Mozilla lz4 compressed data
+!:mime	application/x-lz4+json
+# mozlz4 extension seems to be used for search/store, while jsonlz4 for bookmarks
+!:ext	jsonlz4/mozlz4
+# decomp_size
+>8	ulelong	x				\b, originally %u bytes
+# lz4 data
+#>12	ubequad	x				\b, lz4 data 0x%16.16llx

More information about the File mailing list