[File] [PATCH] Magdir/Windows Microsoft Outlook email *:PAB, *.PST *.OST

Jörg Jenderek joerg.jen.der.ek at gmx.net
Mon Jun 6 20:38:00 UTC 2022


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

some days ago i run Pirisoft ccleaner tool. It complains about file
name extension PAB. So I look for such files on my systems.

When running file command version 5.41 on such examples and related
files i get an output like:

OL2003Password.pst:             Microsoft Outlook email folder
				(>=2003)
OL2003Password2.pst:            Microsoft Outlook email folder
				(>=2003)
Outlook-hj.pst:                 Microsoft Outlook email folder
				(>=2003)
example-64bit.pst:              Microsoft Outlook email folder
				(>=2003)
mailbox.PAB:                    Microsoft Outlook email folder
				(<=2002)
outlook.pst:                    Microsoft Outlook email folder
				(<=2002)
test-ost.ost:                   Microsoft Outlook email folder
test-v15.pst:                   Microsoft Outlook email folder
test-v16.pst:                   Microsoft Outlook email folder
test-v37.pst:                   Microsoft Outlook email folder
x-fmt-248-signature-id-260.pst: Microsoft Outlook email folder
				(<=2002)
x-fmt-249-signature-id-261.pst: Microsoft Outlook email folder
				(>=2003)
x-fmt-75-signature-id-472.pab:  Microsoft Outlook email folder


Furthermore only generic mime type application/octet-stream is
shown with -i. With option --extension only 3 byte sequence ??? is
shown.

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html).
Most PAB examples are described as "Microsoft Personal Address Book"
by pab.trid.xml. The PST examples marked with ">=2003" are described
first as "Microsoft Outlook Personal Folder (Unicode)" by
pst-unicode.trid.xml. The PST examples marked with "<=2002" are
described only as "Microsoft OutLook Personal Folder (ANSI)" by
pst.trid.xml. The OST example is described as "Outlook Exchange
Offline Storage" by ost.trid.xml (See appended trid-v-outlook.txt.gz)
.

For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/).
This does not identify real PAB examples like mailbox.PAB as
"Microsoft Outlook Personal Address Book" by PUID x-fmt/75 because at
offset 8 the wMagicClient as 2-byte string is BA and not AB like in
x-fmt-75-signature-id-472.pab. So this seems here to be a swap change
bug. For PST samples it uses the same names as TrID. It shows also
under version year ranges. The ANSI variant is described by PUID
x-fmt/248 and an additional 1997-2002, whereas for the Unicode the
range 2003-2007 is shown by PUID x-fmt/249. So here we get also the
year information that is also shown by file command. Samples with
unlikely or maybe not existing versions like test-v16.pst and
test-v37.pst are not recognized. The OST example is not recognized
(See appended droid-outlook.csv.gz).

Luckily DROID and TrID with -v option shows a related URL and used
file name extensions. With this information i was able to find a
page about Personal Folder File on file formats archive team web
site. There a link to official Microsoft description [MS-PST].pdf
is mentioned. And also unofficial PFF format specification is
listed as "Personal Folder File (PFF) format.pdf".
That informations are now expressed by additional comment lines
inside Magdir/Windows like:
# URL:		http://fileformats.archiveteam.org/
#		wiki/Personal_Folder_File
# Reference:	https://interoperability.blob.core.windows.net/files/
#		MS-PST/%5bMS-PST%5d.pdf
#		http://mark0.net/download/triddefs_xml.7z
#		defs/p/pab.trid.xml
#		defs/p/pst.trid.xml
#		defs/p/pst-unicode.trid.xml
#		defs/o/ost.trid.xml

The description happens inside  Magdir/Windows by lines like:
 0	lelong	0x4E444221	Microsoft Outlook email folder
 >10	leshort	0x0e		(<=2002)
 >10	leshort	0x17		(>=2003)

After the test of starting 4 byte dwMagic !BDN the describing text is
shown. By next 2 lines for two versions 14 and 23 year information is
shown. These 2 version seems to be the common one. Then for unusual
versions like example-v15.pst nothing year information is shown.

Unfortunately this version variable wVer is not clearly explained.
It it is written that this value must be 14 (=Eh) or 15 (=Fh) if
the file is an ANSI PST file. From version 21 (=15h according to
non-official documentation) or value greater than 23 it is a
Unicode PST file (UTF-16 little-endian) and highest mentioned value
is 37. So this version information now becomes like:

 >>10	uleshort	x		(
 >>10	leshort		<0x10		\b<=2002, ANSI,
 >>10	leshort		>0x14		\b>=2003, Unicode,
 >>10	uleshort	x		version %u)

In "newer" variant format has now become to Unicode, but also the
size of some fields grow from 32-bit to 64-bit or meaning changed.
So after the first twenty four bytes the fields also appear at
other positions.

So for Unicode exist a branch with additional information, that
looks like:
 >>10	uleshort	>20
 >>>184	ulequad	x		\b, %llu bytes
 >>>513	ubyte	x		\b, bCryptMethod=%u
The size of the file is stored as 8 byte integer variable
ibFileEof. The variable bCryptMethod describes the Encryption type.
Zero means no encryption. One is used for encryption with
'permutation algorithm'. Two is used for encryption with 'cyclic
algorithm' and 16
is used for encrypted with Windows Information Protection (WIP).
For ANSI variant the same information is shown by branch which
looks like:

 >>10	uleshort	<16
 >>>168	ulelong	x		\b, %u bytes
 >>>461	ubyte	x		\b, bCryptMethod=%u

The DROID samples x-fmt-75-signature-id-472.pab
x-fmt-248-signature-id-260.pst x-fmt-249-signature-id-261.pst are
not real Outlook examples. These contain just few dozen starting
bytes of such outlook files. To skip these sample from
misidentification just also test for existence of later field like
bPlatformCreate value. So this additional part looks like:
 >14	ubyte	x		Microsoft Outlook
 !:mime		application/vnd.ms-outlook
Instead generic mime type application/octet-stream i display
application/vnd.ms-outlook mentioned on reference site. But this
not mentioned on other sites and is not official registered. So
maybe this must be changed again.

The wMagicClient can be shown by line like:
>> 8	leshort		x			\b, wMagicClient=%#x
The string value AB (4142h) is used for PAB files. SM (534Dh) is
used for PST files and SO (534Fh) is used for OST files. So
depending on that value sub classification (with other type
description and file name extension) is done. This now is expressed
by lines like:

 >>8	leshort		0x4142		Personal Address Book
 !:ext	pab
 >>8	leshort		0x4D53		Personal Storage
 !:ext	pst
 >>8	leshort		0x4F53		Offline Storage
 !:ext	ost

After applying the above mentioned modifications by patch
file-5.41-windows-pab.diff then the Outlook files are described
with more details and misidentification vanish. This now looks like:

OL2003Password.pst:             Microsoft Outlook Personal Storage
				(>=2003, Unicode, version 23),
				dwUnique=0x17, 271360 bytes,
				bCryptMethod=1, CRC32 0xfc6a0096
OL2003Password2.pst:            Microsoft Outlook Personal Storage
				(>=2003, Unicode, version 23),
				dwUnique=0x15, 271360 bytes,
				bCryptMethod=2, CRC32 0x6ba5f580
Outlook-hj.pst:                 Microsoft Outlook Personal Storage
				(>=2003, Unicode, version 23),
				dwUnique=0x10f31, 556680192 bytes,
				bCryptMethod=1, CRC32 0x5de74682
example-64bit.pst:              Microsoft Outlook Personal Storage
				(>=2003, Unicode, version 23),
				dwUnique=0x1d, 271360 bytes,
				CRC32 0x89cb68c4
mailbox.PAB:                    Microsoft Outlook Personal Address
				Book
				(<=2002, ANSI, version 14),
				bPlatformCreate=2, bPlatformAccess=2,
				dwUnique=0x5, 32768 bytes
outlook.pst:                    Microsoft Outlook Personal Storage
				(<=2002, ANSI, version 14),
				bPlatformCreate=2, bPlatformAccess=2,
				dwReserved1=0x8361a034,
				dwReserved2=0x373263,
				dwUnique=0x82, 278528 bytes,
				bCryptMethod=1
test-ost.ost:                   Microsoft Outlook Offline Storage
				(<=2002, ANSI, version 15),
				dwUnique=0x4c08, 2556928 bytes,
				bCryptMethod=1
test-v15.pst:                   Microsoft Outlook Personal Storage
				(<=2002, ANSI, version 15),
				dwUnique=0x4c08, 2556928 bytes,
				bCryptMethod=1
test-v16.pst:                   Microsoft Outlook Personal Storage
				( version 16)
test-v37.pst:                   Microsoft Outlook Personal Storage
				(>=2003, Unicode, version 37),
				dwUnique=0x400, 9 bytes,
				bSentinel=0x83, CRC32 0x58585858
x-fmt-248-signature-id-260.pst: data
x-fmt-249-signature-id-261.pst: data
x-fmt-75-signature-id-472.pab:  data

I hope my diff file can be applied in future version of file
utility.

With best wishes
Jörg Jenderek
- --
Jörg Jenderek
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYp5lpwAKCRCv8rHJQhrU
1iFnAJoCAJt+1KUwdcjrnZO/MnXZhHJDVwCeIMgnGziW6W1BfxMWsPh0CK2yvzk=
=lpzG
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: droid-outlook.csv.gz
Type: application/x-gzip
Size: 673 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220606/1394fdf8/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-outlook.txt.gz
Type: application/x-gzip
Size: 598 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220606/1394fdf8/attachment-0001.bin>
-------------- next part --------------
--- file-5.41/magic/Magdir/windows.old	2021-05-12 18:30:24.000000000 +0200
+++ file-5.41/magic/Magdir/windows	2022-06-06 22:22:54.570546400 +0200
@@ -427,9 +427,95 @@
 
 # Summary: Outlook Personal Folders
 # Created by: unknown
-0	lelong		0x4E444221	Microsoft Outlook email folder
->10	leshort		0x0e		(<=2002)
->10	leshort		0x17		(>=2003)
+# Update:	Joerg Jenderek
+# URL:		http://fileformats.archiveteam.org/wiki/Personal_Folder_File
+#		https://en.wikipedia.org/wiki/Personal_Storage_Table
+# Reference:	https://interoperability.blob.core.windows.net/files/MS-PST/%5bMS-PST%5d.pdf
+#		http://mark0.net/download/triddefs_xml.7z/defs/p/pab.trid.xml
+# dwMagic !BDN
+0	lelong		0x4E444221
+# skip DROID x-fmt-75-signature-id-472.pab x-fmt-248-signature-id-260.pst x-fmt-249-signature-id-261.pst
+# by check for existance of bPlatformCreate value
+>14	ubyte	x		Microsoft Outlook
+#!:mime		application/octet-stream
+# NOT official registered !
+!:mime		application/vnd.ms-outlook
+# dwCRCPartial; 32-bit cyclic redundancy check (CRC) value of followin 471 bytes; zero for 64-bit
+#>>4	ulelong		!0			\b, CRC %#x
+# wMagicClient; AB (4142h) is used for PAB files; SM (534Dh) is used for PST files; SO (534Fh) is used for OST files
+#>>8	leshort		x			\b, wMagicClient=%#x
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/p/pab.trid.xml
+# Note:		called "Microsoft Personal Address Book" by TrID and
+#		"Microsoft Outlook Personal Address Book" by DROID via x-fmt/75
+>>8	leshort		0x4142			Personal Address Book
+#!:mime	application/x-ms-pab
+!:ext	pab
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/p/pst.trid.xml
+#		http://mark0.net/download/triddefs_xml.7z/defs/p/pst-unicode.trid.xml
+# Note:		called "Microsoft OutLook Personal Folder" by TrID and
+#		by DROID via x-fmt/248 for ANSI and via x-fmt/249 for Unicode
+#>>8	leshort		0x4D53			\b, PST~
+# called "Microsoft Outlook email folder" in ./windows version 1.37 and older
+>>8	leshort		0x4D53			Personal Storage
+#!:mime	application/x-ms-pst
+!:ext	pst
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/o/ost.trid.xml
+# Note:		called "Outlook Exchange Offline Storage" by TrID
+>>8	leshort		0x4F53			Offline Storage
+#!:mime	application/x-ms-ost
+!:ext	ost
+# wVer; file format version. 14 or 15 if the file is ANSI; > 21 or 23(=17h) if Unicode; 37 for written by Outlook with WIP
+>>10	uleshort	x			(
+# probably NO intermediate versions exist
+>>10	leshort		<0x10			\b<=2002, ANSI,
+>>10	leshort		>0x14			\b>=2003, Unicode,
+>>10	uleshort	x			version %u)
+# wVerClient; client file format version like: 19 22
+#>>12	uleshort	x			\b, wVerClient=%u
+# bPlatformCreate; This value MUST be set to 1 but also found 2
+>>14	ubyte		>1			\b, bPlatformCreate=%u
+# bPlatformAccess; This value MUST be set to 1 but also found 2
+>>15	ubyte		>1			\b, bPlatformAccess=%u
+# dwReserved1; SHOULD ignore and NOT modify this value; SHOULD initialize to zero
+>>16	ulelong		!0			\b, dwReserved1=%#x
+# dwReserved2; SHOULD ignore and NOT modify this value; SHOULD initialize to zero
+>>20	ulelong		!0			\b, dwReserved2=%#x
+# ANSI 32-bit variant Outlook 1997-2002
+>>10	uleshort	<16
+# bidNextB; next BlockID (ANSI 4 bytes)
+#>>>24		ulelong	!0			\b, bidNextB=%#x
+# bidNextP; Next available back BlockID pointer
+#>>>28		ulelong	!0			\b, bidNextP=%#x
+# dwUnique; value monotonically increased when modifying PST; so CRC is changing
+>>>32		ulelong	!0			\b, dwUnique=%#x
+# rgnid[128]; A fixed array of 32 NodeIDs, each corresponding to one of the 32 possible NID_TYPEs
+#>>>36		ubequad	x			\b, rgnid=%#llx...
+# dwReserved; Implementations SHOULD ignore this value and SHOULD NOT modify it; Initialized zero
+>>>164		ulelong	!0			\b, dwReserved=%#x
+# ibFileEof; the size of the PST file, in bytes (ANSI 4 bytes)
+>>>168		ulelong	x			\b, %u bytes
+# ibAMapLast; offset to the last AMap page
+#>>>172		ulelong	x			\b, ibAMapLast=%#x
+# bSentinel; MUST be set to 0x80
+>>>460		ubyte	!0x80			\b, bSentinel=%#x
+# bCryptMethod: 0~No encryption 1~encryption with permutation 2~encryption with cyclic 16~encryption with Windows Information Protection (WIP)
+>>>461		ubyte	>0			\b, bCryptMethod=%u
+# UNICODE 64-bit variant Outlook 2003-2007
+>>10	uleshort >20
+# bidUnused; Unused 8 bytes padding (Unicode only); sometimes like: 0x0000000100000004
+>>>24		ulequad	!0x0000000100000004	\b, bidUnused=%#16.16llx
+# dwUnique; value monotonically increased when modifying PST; so CRC is changing
+>>>40		ulelong	!0			\b, dwUnique=%#x
+# rgnid[] (128 bytes): A fixed array of 32 NIDs, each corresponding to one of the 32 possible
+#>>>44		ubequad	x			\b, rgnid=%#llx...
+# ibFileEof; the size of the PST file, in bytes (Unicode 8 bytes)
+>>>184		ulequad	x			\b, %llu bytes
+# bSentinel; MUST be set to 0x80
+>>>512		ubyte	!0x80			\b, bSentinel=%#x
+# bCryptMethod; Encryption type like: 0 1 2 16
+>>>513		ubyte	>0			\b, bCryptMethod=%u
+# dwCRC; 32-bit CRC of the of the previous 516 bytes
+>>>524		ulelong		x		\b, CRC32 %#x
 
 
 # Summary: Windows help cache
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.41-windows-pab.diff.sig
Type: application/octet-stream
Size: 2113 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220606/1394fdf8/attachment.obj>


More information about the File mailing list