[File] [PATCH] Magdir/Windows Microsoft Outlook Express DBX file+Nickfile *.NK2
Christos Zoulas
christos at zoulas.com
Sat Jul 2 17:46:20 UTC 2022
Committed, thanks!
christos
> On Jun 29, 2022, at 8:07 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello,
>
> some days ago i handles some Outlook Personal storage table files.
>
> So I look for more other file type belonging or generated by
> Microsoft Outlook. When running file command version 5.42 on such
> examples and related files i get an output like:
>
> Entwuerfe.dbx: MS Outlook Express DBX file,
> message database
> Folders.dbx: MS Outlook Express DBX file,
> folder database
> GeloeschteObjekte.dbx: MS Outlook Express DBX file,
> message database
> NK2Edit.dat: data
> NK2Edit.nk2.NK2Edit.bak: data
> Offline.dbx: MS Outlook Express DBX file,
> offline database
> Posteingang.dbx: MS Outlook Express DBX file,
> message database
> example.n2k: data
> fmt-838-signature-id-1193.dbx: MS Outlook Express DBX file,
> message database
> fmt-839-signature-id-1194.dbx: MS Outlook Express DBX file,
> folder database
>
> Furthermore only generic mime type application/octet-stream is
> shown with -i. With option --extension only 3 byte sequence ??? is
> shown.
>
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). The DBX examples are
> described as "Outlook Express Database" by dbx.trid.xml.
> The other examples are described as "Outlook Nickfile" by
> nk2.trid.xml (See appended trid-v-dbx.txt.gz).
>
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/).
> This identifies many DBX examples as "Outlook Express Database".
> The folder database variants like example Folders.dbx are described
> as "Outlook Express Folder Database" by PUID fmt/839. The message
> database variants like example Posteingang.dbx are described as
> "Outlook Express Message Database" by PUID fmt/838. The offline
> database variant like example Offline.dbx is not recognized and
> described wrong as "Microsoft Visual FoxPro Table". The non DBX
> examples are not recognized (See appended droid-dbx.csv.gz).
>
> Luckily DROID and TrID with -v option shows a related URL and used
> file name extensions. With this information i was able to find a
> page about Outlook Express Database on file formats archive team web
> site. There a link to software ol2mbox with unofficial FILE-FORMAT
> is mentioned. That informations are now expressed by additional
> comment lines inside Magdir/Windows like:
>
> # URL: http://fileformats.archiveteam.org/
> # wiki/Outlook_Express_Database
> # Reference: http://mark0.net/download/triddefs_xml.7z
> # defs/d/dbx.trid.xml
> # https://sourceforge.net/projects/ol2mbox/files/LibDBX
> # /v1.0.4/libdbx_1.0.4.tar.gz/FILE-FORMAT
>
>
> The description happens inside Magdir/Windows by lines like:
> 0 string \xCF\xAD\x12\xFE MS Outlook Express DBX file
>> 4 byte =0xC5 \b, message database
>> 4 byte =0xC6 \b, folder database
>> 4 byte =0xC7 \b, account information
>> 4 byte =0x30 \b, offline database
>
> After the 4 starting magic bytes sub classification is done by byte
> at offset four. In reality this is the beginning of a characteristic
> class ID (CLSID) mentioned on reference site. For control reason that
> can be shown by line like:
>>> 4 guid x \b, CLSID %s
> So CLSID 6F74FDC5-E366-11d1-9A4E-00C04FA309D4 is used for Message and
> so on.
>
> The DROID samples fmt-838-signature-id-1193.dbx and
> fmt-839-signature-id-1194.dbx are
> not real Outlook examples. These contain just few dozen starting
> bytes of such outlook files. To skip these samples from
> misidentification just also test for existence of later field like
> file size value. So this now starts like:
> 0 string \xCF\xAD\x12\xFE
>> 0x7C ulelong >0 MS Outlook Express DBX file
> !:mime application/x-ms-dbx
> !:ext dbx
> I found no official registered mime type. So instead of generic
> application/octet-stream i display an user defined one.
>
> Afterwards a version ( like 5.5 or 5.2 ) is stored. Where first comes
> the minor and then the major part. Version 5.5 seems to be the most
> common one. DROID checks the complete 16 bytes of the CSLID and and
> also for the version number 5.5. Therefore the offline variant
> example with version 5.2 is not recognized. So show unusual version
> by lines like:
>>> 20 ulequad !0x0000000500000005 \b, version
>>>> 24 ulelong x %u
>>>> 20 ulelong x \b.%u
>
> The total size of the DBX file is shown by line like:
>>> 0x7C ulelong x \b, ~ %u bytes
> Unfortunately this is not always the exact file size. Sometimes the
> real size is a little bit higher than internal stored size.
> Furthermore this value was used to skip invalid DROID examples.
>
> The number of items (That is normally the number of email messages)
> and the highest email ID ( typically one greater than item count) can
> be shown by lines like:
>>> 0x5c ulelong x \b, highest ID %#x
>>> 0xC4 ulelong x \b, %u item
>>> 0xC4 ulelong !1 \bs
> So samples with 0 items contain no messages. That can
> partly verified by extracting messages by command line tool like:
> undbx --verbosity 4 Posteingang.dbx
>
> The file offset pointing to a page of Data Indexes is shown by line
> like:
>>> 0xE4 ulelong >0 \b, index pointer %#x
> For examples with 0 items this index pointer is zero of course.
>
> Luckily TrID with -v option shows a related URL and used
> file name extension NK2 for the nick files. There a link to Format
> specification (libnk2 project) is mentioned. That informations are
> now expressed by additional comment lines inside Magdir/Windows like:
>
> # URL: http://fileformats.archiveteam.org/wiki/Nickfile
> # Reference: http://mark0.net/download/triddefs_xml.7z
> # defs/n/nk2.trid.xml
>
> The description now happens inside Magdir/Windows by lines starting
> like:
>
> 0 ubelong 0x0DF0ADBA MS Outlook Nickfile
> !:mime application/x-ms-nickfile
> !:ext nk2/bak/dat
>
> Instead of generic mime type application/octet-stream i display an
> user defined one. The file name extension bak is used for backup,
> nick is used by "older" Outlook, but i myself do not found such
> examples. dat extension is used by "newer" Outlook (probably 2010 -
> 2016). Maybe this depends on the next bytes, which maybe are
> something like a version. This is shown by next lines like:
>
>> 4 ulelong x \b, probably version %u
>> 8 ulelong x \b.%u
>
> Afterwards the number of rows (nicknames or aliases items) in file
> is stored. That information is shown by line like:
>> 12 ulelong x \b, %u items
>
> Afterwards the number of items (in some documents called columns or
> properties value entries with values like 17h) is shown by line like:
>> 16 ulelong x \b, %u entries
>
> The entry start with value and entry type. Some times called
> property tag and property identifier. This information is shown by
> lines like:
>> 20 uleshort x \b, value type %#4.4x
>> 22 uleshort x \b, entry type %#4.4x
>
> If i understand the documents right, then all real examples should
> start with values 001Fh and 6001h, That means UTF-16 little endian
> string and PR_DOTSTUFF_STATE( or called PR_NICK_NAME_W) type.
>
> After reserved part and irrelevant union this information follows.
> First comes the number of bytes (like: 2Ch) for Unicode string.
> Then comes the UTF-16 little endian string (PT_UNICODE with value
> like janesmith at contoso.org). So this first entry is shown by lines
> like:
>> 20 uleshort =0x001F
>>> 36 ulelong x \b, %u bytes
>>> 40 lestring16 x "%s"
>
> After applying the above mentioned modifications by patch
> file-5.41-windows-dbx.diff then the Outlook files are described
> with more details and misidentification vanish. This now looks like:
>
> Entwuerfe.dbx: MS Outlook Express DBX file,
> message database,
> ~ 139376 bytes,
> highest ID 0x2, 1 item,
> index pointer 0x1e254
> Folders.dbx: MS Outlook Express DBX file,
> folder database,
> ~ 74720 bytes,
> highest ID 0x8, 7 items,
> index pointer 0xe5c4
> GeloeschteObjekte.dbx: MS Outlook Express DBX file,
> message database,
> ~ 60116 bytes,
> highest ID 0x1, 0 items
> NK2Edit.dat: MS Outlook Nickfile,
> probably version 12.0, 0 items,
> 0 entries, value type 0x8983,
> entry type 0x83e2
> NK2Edit.nk2.NK2Edit.bak: MS Outlook Nickfile,
> probably version 10.1, 0 items,
> 0 entries, value type 0xc51c,
> entry type 0x6918
> Offline.dbx: MS Outlook Express DBX file,
> offline database, version 5.2,
> ~ 9656 bytes,
> highest ID 0x1, 0 items
> Posteingang.dbx: MS Outlook Express DBX file,
> message database,
> ~ 139376 bytes,
> highest ID 0x2, 1 item,
> index pointer 0x1e254
> example.n2k: MS Outlook Nickfile,
> probably version 10.1, 2 items,
> 23 entries, value type 0x001f,
> entry type 0x6001, 44 bytes
> "janesmith at contoso.org"
> fmt-838-signature-id-1193.dbx: data
> fmt-839-signature-id-1194.dbx: data
>
> I hope my diff file can be applied in future version of file
> utility.
>
> With best wishes
> Jörg Jenderek
> - --
> Jörg Jenderek
>
>
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYrzpJAAKCRCv8rHJQhrU
> 1lKnAKCK5XhkVKViIqXODd0fAJJKV4DcpwCgganceQ+7gvv2eR46U3tFT4/ChLI=
> =kj7h
> -----END PGP SIGNATURE-----
> <Nachrichtenteil als Anhang.DEFANGED-88><trid-v-dbx.txt.gz><droid-dbx.csv.gz><file-5_42-windows-dbx_diff.DEFANGED-89><file-5_42-windows-dbx_diff_sig.DEFANGED-90>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20220702/512c70a2/attachment.asc>
More information about the File
mailing list