[File] [PATCH] Magdir/Windows Microsoft Outlook Express DBX file+Nickfile *.NK2
Jörg Jenderek
joerg.jen.der.ek at gmx.net
Thu Jun 30 00:07:00 UTC 2022
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello,
some days ago i handles some Outlook Personal storage table files.
So I look for more other file type belonging or generated by
Microsoft Outlook. When running file command version 5.42 on such
examples and related files i get an output like:
Entwuerfe.dbx: MS Outlook Express DBX file,
message database
Folders.dbx: MS Outlook Express DBX file,
folder database
GeloeschteObjekte.dbx: MS Outlook Express DBX file,
message database
NK2Edit.dat: data
NK2Edit.nk2.NK2Edit.bak: data
Offline.dbx: MS Outlook Express DBX file,
offline database
Posteingang.dbx: MS Outlook Express DBX file,
message database
example.n2k: data
fmt-838-signature-id-1193.dbx: MS Outlook Express DBX file,
message database
fmt-839-signature-id-1194.dbx: MS Outlook Express DBX file,
folder database
Furthermore only generic mime type application/octet-stream is
shown with -i. With option --extension only 3 byte sequence ??? is
shown.
For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). The DBX examples are
described as "Outlook Express Database" by dbx.trid.xml.
The other examples are described as "Outlook Nickfile" by
nk2.trid.xml (See appended trid-v-dbx.txt.gz).
For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/).
This identifies many DBX examples as "Outlook Express Database".
The folder database variants like example Folders.dbx are described
as "Outlook Express Folder Database" by PUID fmt/839. The message
database variants like example Posteingang.dbx are described as
"Outlook Express Message Database" by PUID fmt/838. The offline
database variant like example Offline.dbx is not recognized and
described wrong as "Microsoft Visual FoxPro Table". The non DBX
examples are not recognized (See appended droid-dbx.csv.gz).
Luckily DROID and TrID with -v option shows a related URL and used
file name extensions. With this information i was able to find a
page about Outlook Express Database on file formats archive team web
site. There a link to software ol2mbox with unofficial FILE-FORMAT
is mentioned. That informations are now expressed by additional
comment lines inside Magdir/Windows like:
# URL: http://fileformats.archiveteam.org/
# wiki/Outlook_Express_Database
# Reference: http://mark0.net/download/triddefs_xml.7z
# defs/d/dbx.trid.xml
# https://sourceforge.net/projects/ol2mbox/files/LibDBX
# /v1.0.4/libdbx_1.0.4.tar.gz/FILE-FORMAT
The description happens inside Magdir/Windows by lines like:
0 string \xCF\xAD\x12\xFE MS Outlook Express DBX file
>4 byte =0xC5 \b, message database
>4 byte =0xC6 \b, folder database
>4 byte =0xC7 \b, account information
>4 byte =0x30 \b, offline database
After the 4 starting magic bytes sub classification is done by byte
at offset four. In reality this is the beginning of a characteristic
class ID (CLSID) mentioned on reference site. For control reason that
can be shown by line like:
>>4 guid x \b, CLSID %s
So CLSID 6F74FDC5-E366-11d1-9A4E-00C04FA309D4 is used for Message and
so on.
The DROID samples fmt-838-signature-id-1193.dbx and
fmt-839-signature-id-1194.dbx are
not real Outlook examples. These contain just few dozen starting
bytes of such outlook files. To skip these samples from
misidentification just also test for existence of later field like
file size value. So this now starts like:
0 string \xCF\xAD\x12\xFE
>0x7C ulelong >0 MS Outlook Express DBX file
!:mime application/x-ms-dbx
!:ext dbx
I found no official registered mime type. So instead of generic
application/octet-stream i display an user defined one.
Afterwards a version ( like 5.5 or 5.2 ) is stored. Where first comes
the minor and then the major part. Version 5.5 seems to be the most
common one. DROID checks the complete 16 bytes of the CSLID and and
also for the version number 5.5. Therefore the offline variant
example with version 5.2 is not recognized. So show unusual version
by lines like:
>>20 ulequad !0x0000000500000005 \b, version
>>>24 ulelong x %u
>>>20 ulelong x \b.%u
The total size of the DBX file is shown by line like:
>>0x7C ulelong x \b, ~ %u bytes
Unfortunately this is not always the exact file size. Sometimes the
real size is a little bit higher than internal stored size.
Furthermore this value was used to skip invalid DROID examples.
The number of items (That is normally the number of email messages)
and the highest email ID ( typically one greater than item count) can
be shown by lines like:
>>0x5c ulelong x \b, highest ID %#x
>>0xC4 ulelong x \b, %u item
>>0xC4 ulelong !1 \bs
So samples with 0 items contain no messages. That can
partly verified by extracting messages by command line tool like:
undbx --verbosity 4 Posteingang.dbx
The file offset pointing to a page of Data Indexes is shown by line
like:
>>0xE4 ulelong >0 \b, index pointer %#x
For examples with 0 items this index pointer is zero of course.
Luckily TrID with -v option shows a related URL and used
file name extension NK2 for the nick files. There a link to Format
specification (libnk2 project) is mentioned. That informations are
now expressed by additional comment lines inside Magdir/Windows like:
# URL: http://fileformats.archiveteam.org/wiki/Nickfile
# Reference: http://mark0.net/download/triddefs_xml.7z
# defs/n/nk2.trid.xml
The description now happens inside Magdir/Windows by lines starting
like:
0 ubelong 0x0DF0ADBA MS Outlook Nickfile
!:mime application/x-ms-nickfile
!:ext nk2/bak/dat
Instead of generic mime type application/octet-stream i display an
user defined one. The file name extension bak is used for backup,
nick is used by "older" Outlook, but i myself do not found such
examples. dat extension is used by "newer" Outlook (probably 2010 -
2016). Maybe this depends on the next bytes, which maybe are
something like a version. This is shown by next lines like:
>4 ulelong x \b, probably version %u
>8 ulelong x \b.%u
Afterwards the number of rows (nicknames or aliases items) in file
is stored. That information is shown by line like:
> 12 ulelong x \b, %u items
Afterwards the number of items (in some documents called columns or
properties value entries with values like 17h) is shown by line like:
>16 ulelong x \b, %u entries
The entry start with value and entry type. Some times called
property tag and property identifier. This information is shown by
lines like:
>20 uleshort x \b, value type %#4.4x
>22 uleshort x \b, entry type %#4.4x
If i understand the documents right, then all real examples should
start with values 001Fh and 6001h, That means UTF-16 little endian
string and PR_DOTSTUFF_STATE( or called PR_NICK_NAME_W) type.
After reserved part and irrelevant union this information follows.
First comes the number of bytes (like: 2Ch) for Unicode string.
Then comes the UTF-16 little endian string (PT_UNICODE with value
like janesmith at contoso.org). So this first entry is shown by lines
like:
>20 uleshort =0x001F
>>36 ulelong x \b, %u bytes
>>40 lestring16 x "%s"
After applying the above mentioned modifications by patch
file-5.41-windows-dbx.diff then the Outlook files are described
with more details and misidentification vanish. This now looks like:
Entwuerfe.dbx: MS Outlook Express DBX file,
message database,
~ 139376 bytes,
highest ID 0x2, 1 item,
index pointer 0x1e254
Folders.dbx: MS Outlook Express DBX file,
folder database,
~ 74720 bytes,
highest ID 0x8, 7 items,
index pointer 0xe5c4
GeloeschteObjekte.dbx: MS Outlook Express DBX file,
message database,
~ 60116 bytes,
highest ID 0x1, 0 items
NK2Edit.dat: MS Outlook Nickfile,
probably version 12.0, 0 items,
0 entries, value type 0x8983,
entry type 0x83e2
NK2Edit.nk2.NK2Edit.bak: MS Outlook Nickfile,
probably version 10.1, 0 items,
0 entries, value type 0xc51c,
entry type 0x6918
Offline.dbx: MS Outlook Express DBX file,
offline database, version 5.2,
~ 9656 bytes,
highest ID 0x1, 0 items
Posteingang.dbx: MS Outlook Express DBX file,
message database,
~ 139376 bytes,
highest ID 0x2, 1 item,
index pointer 0x1e254
example.n2k: MS Outlook Nickfile,
probably version 10.1, 2 items,
23 entries, value type 0x001f,
entry type 0x6001, 44 bytes
"janesmith at contoso.org"
fmt-838-signature-id-1193.dbx: data
fmt-839-signature-id-1194.dbx: data
I hope my diff file can be applied in future version of file
utility.
With best wishes
Jörg Jenderek
- --
Jörg Jenderek
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYrzpJAAKCRCv8rHJQhrU
1lKnAKCK5XhkVKViIqXODd0fAJJKV4DcpwCgganceQ+7gvv2eR46U3tFT4/ChLI=
=kj7h
-----END PGP SIGNATURE-----
-------------- next part --------------
--
File mailing list
File at astron.com
https://mailman.astron.com/mailman/listinfo/file
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-dbx.txt.gz
Type: application/x-gzip
Size: 592 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220630/789b6d6c/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: droid-dbx.csv.gz
Type: application/x-gzip
Size: 641 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220630/789b6d6c/attachment-0001.bin>
-------------- next part --------------
--- file-5.42/magic/Magdir/windows.old 2022-05-31 19:39:08.000000000 +0200
+++ file-5.42/magic/Magdir/windows 2022-06-30 01:54:06.077734700 +0200
@@ -16,12 +16,82 @@
# Summary: Outlook Express DBX file
-# Extension: .dbx
# Created by: Christophe Monniez
-0 string \xCF\xAD\x12\xFE MS Outlook Express DBX file
->4 byte =0xC5 \b, message database
->4 byte =0xC6 \b, folder database
->4 byte =0xC7 \b, account information
->4 byte =0x30 \b, offline database
+# Update: Joerg Jenderek
+# URL: http://fileformats.archiveteam.org/wiki/Outlook_Express_Database
+# Reference: http://mark0.net/download/triddefs_xml.7z/defs/d/dbx.trid.xml
+# https://sourceforge.net/projects/ol2mbox/files/LibDBX/
+# v1.0.4/libdbx_1.0.4.tar.gz/FILE-FORMAT
+# Note: called "Outlook Express Database" by TrID and DROID via PUID fmt/838 fmt/839
+# and partly verified by `undbx --verbosity 4 Posteingang.dbx`
+0 string \xCF\xAD\x12\xFE
+# skip DROID fmt-838-signature-id-1193.dbx fmt-839-signature-id-1194.dbx by check for valid file size
+>0x7C ulelong >0 MS Outlook Express DBX file
+#!:mime application/octet-stream
+#!:mime application/vnd.ms-outlook
+!:mime application/x-ms-dbx
+!:ext dbx
+>>4 byte =0xC5 \b, message database
+>>4 byte =0xC6 \b, folder database
+>>4 byte =0xC7 \b, account information
+>>4 byte =0x30 \b, offline database
+# version like: 5.2 5.5 (typical)
+>>20 ulequad !0x0000000500000005 \b, version
+# major version
+>>>24 ulelong x %u
+# minor version
+>>>20 ulelong x \b.%u
+# CLSID: 6F74FDC5-E366-11d1-9A4E-00C04FA309D4~Message 6F74FDC6-E366-11D1-9A4E-00C04FA309D4~Folder
+# 26FE9D30-1A8F-11D2-AABF-006097D474C4~offline
+#>>4 guid x \b, CLSID %s
+# file size; total size of file; sometimes real size a little bit higher
+>>0x7C ulelong x \b, ~ %u bytes
+# highest Email ID; the next email will have a number one higher than this
+>>0x5c ulelong x \b, highest ID %#x
+# item count; number of items stored in this DBX file
+>>0xC4 ulelong x \b, %u item
+# plural s
+>>0xC4 ulelong !1 \bs
+# index pointer; file offset pointing to a page of Data Indexes
+>>0xE4 ulelong >0 \b, index pointer %#x
+# From: Joerg Jenderek
+# URL: http://fileformats.archiveteam.org/wiki/Nickfile
+# https://www.nirsoft.net/utils/outlook_nk2_edit.html
+# Reference: http://mark0.net/download/triddefs_xml.7z/defs/n/nk2.trid.xml
+# https://github.com/libyal/libnk2/blob/main/documentation
+# Nickfile%20(NK2)%20format.asciidoc
+# Note: called "Outlook Nickfile" by TrID & TestDisk and
+# "Outlook Nickname File" by Microsoft Outlook and
+# "Outlook AutoComplete File" by Nirsoft NK2Edit
+# partly verfied by NK2Edit Raw Text Edit Mode
+0 ubelong 0x0DF0ADBA MS Outlook Nickfile
+#!:mime application/octet-stream
+#!:mime application/vnd.ms-outlook
+!:mime application/x-ms-nickfile
+!:ext nk2/dat/bak
+# nick is used by "older" Outlook; dat is used by "newer" Outlook (probably 2010 - 2016); bak is used for backup
+#!:ext nick/nk2/dat/bak
+# Unknown; probably a version indicator like: 0000000Ah 0000000Ch
+>4 ulelong x \b, probably version %u
+# Unknown2; probably a version indicator like: 1 0
+>8 ulelong x \b.%u
+# number of rows (nickname or alias items) in file
+>12 ulelong x \b, %u items
+# number of item entries/columns/properties value like: 17h
+>16 ulelong x \b, %u entries
+# value type/property tag: 001Fh~4 bytes for data size of UTF-16 LE string
+>20 uleshort x \b, value type %#4.4x
+# entry type/property identifier: 6001h~PR_DOTSTUFF_STATE/PR_NICK_NAME_W
+>22 uleshort x \b, entry type %#4.4x
+# Reserved like: 0013FD90h
+#>24 ulelong x \b, reserved %#8.8x
+# value data array/Irrelevant Union like: 0000000004E31A80h
+#>28 ulequad x \b, data %#16.16llx
+# UTF-16
+>20 uleshort =0x001F
+# unicode string bytes like: 2Ch
+>>36 ulelong x \b, %u bytes
+# unicode string value PT_UNICODE like: janesmith at contoso.org
+>>40 lestring16 x "%s"
# Summary: Windows crash dump
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.42-windows-dbx.diff.sig
Type: application/octet-stream
Size: 1942 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220630/789b6d6c/attachment.obj>
More information about the File
mailing list