[File] [PATCH] Magdir/Windows Microsoft Outlook Express DBX file+Nickfile *.NK2

Christos Zoulas christos at zoulas.com
Sat Jul 2 17:46:20 UTC 2022


Committed, thanks!

christos

> On Jun 29, 2022, at 8:07 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hello,
> 
> some days ago i handles some Outlook Personal storage table files.
> 
> So I look for more other file type belonging or generated by
> Microsoft Outlook. When running file command version 5.42 on such
> examples and related files i get an output like:
> 
> Entwuerfe.dbx:                 MS Outlook Express DBX file,
> 			       message database
> Folders.dbx:                   MS Outlook Express DBX file,
> 			       folder database
> GeloeschteObjekte.dbx:         MS Outlook Express DBX file,
> 			       message database
> NK2Edit.dat:                   data
> NK2Edit.nk2.NK2Edit.bak:       data
> Offline.dbx:                   MS Outlook Express DBX file,
> 			       offline database
> Posteingang.dbx:               MS Outlook Express DBX file,
> 			       message database
> example.n2k:                   data
> fmt-838-signature-id-1193.dbx: MS Outlook Express DBX file,
> 			       message database
> fmt-839-signature-id-1194.dbx: MS Outlook Express DBX file,
> 			       folder database
> 
> Furthermore only generic mime type application/octet-stream is
> shown with -i. With option --extension only 3 byte sequence ??? is
> shown.
> 
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). The DBX examples are
> described as "Outlook Express Database" by dbx.trid.xml.
> The other examples are described as "Outlook Nickfile" by
> nk2.trid.xml (See appended trid-v-dbx.txt.gz).
> 
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/).
> This identifies many DBX examples as "Outlook Express Database".
> The folder database variants like example Folders.dbx are described
> as "Outlook Express Folder Database" by PUID fmt/839. The message
> database variants like example Posteingang.dbx are described as
> "Outlook Express Message Database" by PUID fmt/838. The offline
> database variant like example Offline.dbx is not recognized and
> described wrong as "Microsoft Visual FoxPro Table". The non DBX
> examples are not recognized (See appended droid-dbx.csv.gz).
> 
> Luckily DROID and TrID with -v option shows a related URL and used
> file name extensions. With this information i was able to find a
> page about Outlook Express Database on file formats archive team web
> site. There a link to software ol2mbox with unofficial FILE-FORMAT
> is mentioned. That informations are now expressed by additional
> comment lines inside Magdir/Windows like:
> 
> # URL:		http://fileformats.archiveteam.org/
> #		wiki/Outlook_Express_Database
> # Reference:	http://mark0.net/download/triddefs_xml.7z
> #		defs/d/dbx.trid.xml
> #		https://sourceforge.net/projects/ol2mbox/files/LibDBX
> #		/v1.0.4/libdbx_1.0.4.tar.gz/FILE-FORMAT
> 
> 
> The description happens inside  Magdir/Windows by lines like:
> 0	string	\xCF\xAD\x12\xFE	MS Outlook Express DBX file
>> 4	byte	=0xC5			\b, message database
>> 4	byte	=0xC6			\b, folder database
>> 4	byte	=0xC7			\b, account information
>> 4	byte	=0x30			\b, offline database
> 
> After the 4 starting magic bytes sub classification is done by byte
> at offset four. In reality this is the beginning of a characteristic
> class ID (CLSID) mentioned on reference site. For control reason that
> can be shown by line like:
>>> 4	guid	x			\b, CLSID %s
> So CLSID 6F74FDC5-E366-11d1-9A4E-00C04FA309D4 is used for Message and
> so on.
> 
> The DROID samples fmt-838-signature-id-1193.dbx and
> fmt-839-signature-id-1194.dbx are
> not real Outlook examples. These contain just few dozen starting
> bytes of such outlook files. To skip these samples from
> misidentification just also test for existence of later field like
> file size  value. So this now starts like:
> 0	string	\xCF\xAD\x12\xFE
>> 0x7C	ulelong	>0			MS Outlook Express DBX file
> !:mime		application/x-ms-dbx
> !:ext	dbx
> I found no official registered mime type. So instead of generic
> application/octet-stream i display an user defined one.
> 
> Afterwards a version ( like 5.5 or 5.2 ) is stored. Where first comes
> the minor and then the major part. Version 5.5 seems to be the most
> common one. DROID checks the complete 16 bytes of the CSLID and and
> also for the version number 5.5. Therefore the offline variant
> example with version 5.2 is not recognized. So show unusual version
> by lines like:
>>> 20	ulequad	!0x0000000500000005	\b, version
>>>> 24	ulelong	x			%u
>>>> 20	ulelong	x			\b.%u
> 
> The total size of the DBX file is shown by line like:
>>> 0x7C	ulelong	x			\b, ~ %u bytes
> Unfortunately this is not always the exact file size. Sometimes the
> real size is a little bit higher than internal stored size.
> Furthermore this value was used to skip invalid DROID examples.
> 
> The number of items (That is normally the number of email messages)
> and the highest email ID ( typically one greater than item count) can
> be shown by lines like:
>>> 0x5c	ulelong	x			\b, highest ID %#x
>>> 0xC4	ulelong	x			\b, %u item
>>> 0xC4	ulelong	!1			\bs
> So samples with 0 items contain no messages. That can
> partly verified by extracting messages by command line tool like:
> 	undbx --verbosity 4 Posteingang.dbx
> 
> The file offset pointing to a page of Data Indexes is shown by line
> like:
>>> 0xE4	ulelong	>0			\b, index pointer %#x
> For examples with 0 items this index pointer is zero of course.
> 
> Luckily TrID with -v option shows a related URL and used
> file name extension NK2 for the nick files. There a link to Format
> specification (libnk2 project) is mentioned. That informations are
> now expressed by additional comment lines inside Magdir/Windows like:
> 
> # URL:		http://fileformats.archiveteam.org/wiki/Nickfile
> # Reference:	http://mark0.net/download/triddefs_xml.7z
> #		defs/n/nk2.trid.xml
> 
> The description now happens inside Magdir/Windows by lines starting
> like:
> 
> 0	ubelong		0x0DF0ADBA	MS Outlook Nickfile
> !:mime		application/x-ms-nickfile
> !:ext	nk2/bak/dat
> 
> Instead of generic mime type application/octet-stream i display an
> user defined one. The file name extension bak is used for backup,
> nick is used by "older" Outlook, but i myself  do not found such
> examples. dat extension is used by "newer" Outlook (probably 2010 -
> 2016). Maybe this depends on the next bytes, which maybe are
> something like a version. This is shown by next lines like:
> 
>> 4	ulelong		x		\b, probably version %u
>> 8	ulelong		x		\b.%u
> 
> Afterwards the number of rows (nicknames or aliases items) in file
> is stored. That information is shown by line like:
>> 12	ulelong		x		\b, %u items
> 
> Afterwards the number of items (in some documents called columns or
> properties value entries with values like 17h) is shown by line like:
>> 16	ulelong		x		\b, %u entries
> 
> The entry start with value and entry type. Some times called
> property tag and property identifier. This information is shown by
> lines like:
>> 20	uleshort	x		\b, value type %#4.4x
>> 22	uleshort	x		\b, entry type %#4.4x
> 
> If i understand the documents right, then all real examples should
> start with values 001Fh and 6001h, That means UTF-16 little endian
> string and PR_DOTSTUFF_STATE( or called PR_NICK_NAME_W) type.
> 
> After reserved part and irrelevant union this information follows.
> First comes the number of bytes (like: 2Ch) for Unicode string.
> Then comes the UTF-16 little endian string (PT_UNICODE with value
> like janesmith at contoso.org). So this first entry is shown by lines
> like:
>> 20	uleshort	=0x001F
>>> 36	ulelong		x		\b, %u bytes
>>> 40	lestring16	x		"%s"
> 
> After applying the above mentioned modifications by patch
> file-5.41-windows-dbx.diff then the Outlook files are described
> with more details and misidentification vanish. This now looks like:
> 
> Entwuerfe.dbx:                 MS Outlook Express DBX file,
> 			       message database,
> 			       ~ 139376 bytes,
> 			       highest ID 0x2, 1 item,
> 			       index pointer 0x1e254
> Folders.dbx:                   MS Outlook Express DBX file,
> 			       folder database,
> 			       ~ 74720 bytes,
> 			       highest ID 0x8, 7 items,
> 			       index pointer 0xe5c4
> GeloeschteObjekte.dbx:         MS Outlook Express DBX file,
> 			       message database,
> 			       ~ 60116 bytes,
> 			       highest ID 0x1, 0 items
> NK2Edit.dat:                   MS Outlook Nickfile,
> 			       probably version 12.0, 0 items,
> 			       0 entries, value type 0x8983,
> 			       entry type 0x83e2
> NK2Edit.nk2.NK2Edit.bak:       MS Outlook Nickfile,
> 			       probably version 10.1, 0 items,
> 			       0 entries, value type 0xc51c,
> 			       entry type 0x6918
> Offline.dbx:                   MS Outlook Express DBX file,
> 			       offline database, version 5.2,
> 			       ~ 9656 bytes,
> 			       highest ID 0x1, 0 items
> Posteingang.dbx:               MS Outlook Express DBX file,
> 			       message database,
> 			       ~ 139376 bytes,
> 			       highest ID 0x2, 1 item,
> 			       index pointer 0x1e254
> example.n2k:                   MS Outlook Nickfile,
> 			       probably version 10.1, 2 items,
> 			       23 entries, value type 0x001f,
> 			       entry type 0x6001, 44 bytes
> 			       "janesmith at contoso.org"
> fmt-838-signature-id-1193.dbx: data
> fmt-839-signature-id-1194.dbx: data
> 
> I hope my diff file can be applied in future version of file
> utility.
> 
> With best wishes
> Jörg Jenderek
> - --
> Jörg Jenderek
> 
> 
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
> 
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYrzpJAAKCRCv8rHJQhrU
> 1lKnAKCK5XhkVKViIqXODd0fAJJKV4DcpwCgganceQ+7gvv2eR46U3tFT4/ChLI=
> =kj7h
> -----END PGP SIGNATURE-----
> <Nachrichtenteil als Anhang.DEFANGED-88><trid-v-dbx.txt.gz><droid-dbx.csv.gz><file-5_42-windows-dbx_diff.DEFANGED-89><file-5_42-windows-dbx_diff_sig.DEFANGED-90>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20220702/512c70a2/attachment.asc>


More information about the File mailing list