[File] [PATCH] of Magdir/mozilla for Mozilla lz4 compressed data; extension jsonlz4, mozlz4

Christos Zoulas christos at zoulas.com
Mon Nov 26 16:24:56 UTC 2018


On Nov 26,  4:18pm, joerg.jen.der.ek at gmx.net (=?UTF-8?Q?J=c3=b6rg_Jenderek?=) wrote:
-- Subject: [File] [PATCH] of Magdir/mozilla for Mozilla lz4 compressed data;

| Hello,
| 
| some day ago i run a treemap utility on my disks. In Thunderbird and
| Firefox user directories i found unknown files with name extension
| "jsonlz4" and "mozlz4". Such examples are described by the file command
| version 5.35 only as "data" or "Unicode text, with very long lines".
| 
| The File Identifier TrID ( see http://mark0.net/soft-trid-e.html ) on
| the other hand describes such examples often as "Mozilla mozLz4
| compressed data (generic)" and some times also "Mozilla search engines
| info".
| 
| With verbose option -v this software show URL with information about
| that file format as
| http://fileformats.archiveteam.org/wiki/Mozilla_LZ4
| 
| But this URL was not a good starting point. So i choose web page about
| lz4 compression as starting URL:
| https://lz4.github.io/lz4/
| There a working uncompression utility named "dejsonlz4" was mentioned.
| So i use URL to source "dejsonlz4.c" as reference.
| 
| According to that source file i add lines to Magdir/mozilla. First comes
| a 8 byte magic string "mozLz40". This is now done by line like:
|  0	string	mozLz40\0
| Afterwards the size of the original uncompressed file is stored as
| variable "decomp_size" inside 4 bytes. This information is now shown by
| line like:
|  >8	ulelong	x				\b, originally %u bytes
| Afterward the lz4 compressed data is stored. For debugging purpose this
| can be displayed by a line like:
|  >12	ubequad	x				\b, lz4 data 0x%16.16llx
| 
| According to found information Mozilla use their own 12 byte
| non-standard header. That is annoying, because standard utility lz4
| program can not be used to unpack compressed files.
| After uncompressing some examples with dejsonlz4 originally files seems
| to be text files with JSON format. So with mime type "application/x-lz4"
| for lz4 compressed file i choose now for Mozilla lz4 a user defined typ
| by line
|  !:mime	application/x-lz4+json
| 
| When bookmark are compressed by Mozilla the resulting archive seems to
| have the file name "jsonlz4". When "search" and other "store" files are
| compressed the file name extension "mozlz4" seems to be used for
| compressed results. This is now expressed by line:
|  !:ext	jsonlz4/mozlz4
| 
| Because not only bookmarks are compressed and the file format is also
| used by Thunderbird i choose as identifying text the phrase "Mozilla lz4
| compressed data".
| 
| After applying the above mentioned modifications by patch
| file-5.35-mozilla-lz4.diff then all such compressed examples are
| described by Magdir/mozilla like:
| 
| search.json.mozlz4:
| 	Mozilla lz4 compressed data, originally 6441 bytes
| store.json.mozlz4:
| 	Mozilla lz4 compressed data, originally 56 bytes
| bookmarks-2017-04-05_331_6zmXnxzqdvQiSXejXVyI6g==.jsonlz4:
| 	Mozilla lz4 compressed data, originally 134661 bytes
| 
| When looking inside Magdir/mozilla i saw a similar magic line
|  0	string	mozLz4a		Mozilla lz4 compressed bookmark data
| After searching in the net and looking in trid database i come to
| conclusion that this magic "mozLz4a" maybe is a mis spelling or belong
| to alpha version of Mozilla. But i am not sure about that item. So i add
| this as a comment. So maybe another expert can check this.
| 
| I hope my diff file and suggestions can be applied in future version of
| file utility.

Added, thanks!

christos


More information about the File mailing list