[File] [PATCH] Magdir/Windows Microsoft Outlook email *:PAB, *.PST *.OST

Christos Zoulas christos at zoulas.com
Fri Jun 17 18:05:59 UTC 2022


Committed, thanks!

christos

> On Jun 6, 2022, at 4:38 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hello,
> 
> some days ago i run Pirisoft ccleaner tool. It complains about file
> name extension PAB. So I look for such files on my systems.
> 
> When running file command version 5.41 on such examples and related
> files i get an output like:
> 
> OL2003Password.pst:             Microsoft Outlook email folder
> 				(>=2003)
> OL2003Password2.pst:            Microsoft Outlook email folder
> 				(>=2003)
> Outlook-hj.pst:                 Microsoft Outlook email folder
> 				(>=2003)
> example-64bit.pst:              Microsoft Outlook email folder
> 				(>=2003)
> mailbox.PAB:                    Microsoft Outlook email folder
> 				(<=2002)
> outlook.pst:                    Microsoft Outlook email folder
> 				(<=2002)
> test-ost.ost:                   Microsoft Outlook email folder
> test-v15.pst:                   Microsoft Outlook email folder
> test-v16.pst:                   Microsoft Outlook email folder
> test-v37.pst:                   Microsoft Outlook email folder
> x-fmt-248-signature-id-260.pst: Microsoft Outlook email folder
> 				(<=2002)
> x-fmt-249-signature-id-261.pst: Microsoft Outlook email folder
> 				(>=2003)
> x-fmt-75-signature-id-472.pab:  Microsoft Outlook email folder
> 
> 
> Furthermore only generic mime type application/octet-stream is
> shown with -i. With option --extension only 3 byte sequence ??? is
> shown.
> 
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html).
> Most PAB examples are described as "Microsoft Personal Address Book"
> by pab.trid.xml. The PST examples marked with ">=2003" are described
> first as "Microsoft Outlook Personal Folder (Unicode)" by
> pst-unicode.trid.xml. The PST examples marked with "<=2002" are
> described only as "Microsoft OutLook Personal Folder (ANSI)" by
> pst.trid.xml. The OST example is described as "Outlook Exchange
> Offline Storage" by ost.trid.xml (See appended trid-v-outlook.txt.gz)
> .
> 
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/).
> This does not identify real PAB examples like mailbox.PAB as
> "Microsoft Outlook Personal Address Book" by PUID x-fmt/75 because at
> offset 8 the wMagicClient as 2-byte string is BA and not AB like in
> x-fmt-75-signature-id-472.pab. So this seems here to be a swap change
> bug. For PST samples it uses the same names as TrID. It shows also
> under version year ranges. The ANSI variant is described by PUID
> x-fmt/248 and an additional 1997-2002, whereas for the Unicode the
> range 2003-2007 is shown by PUID x-fmt/249. So here we get also the
> year information that is also shown by file command. Samples with
> unlikely or maybe not existing versions like test-v16.pst and
> test-v37.pst are not recognized. The OST example is not recognized
> (See appended droid-outlook.csv.gz).
> 
> Luckily DROID and TrID with -v option shows a related URL and used
> file name extensions. With this information i was able to find a
> page about Personal Folder File on file formats archive team web
> site. There a link to official Microsoft description [MS-PST].pdf
> is mentioned. And also unofficial PFF format specification is
> listed as "Personal Folder File (PFF) format.pdf".
> That informations are now expressed by additional comment lines
> inside Magdir/Windows like:
> # URL:		http://fileformats.archiveteam.org/
> #		wiki/Personal_Folder_File
> # Reference:	https://interoperability.blob.core.windows.net/files/
> #		MS-PST/%5bMS-PST%5d.pdf
> #		http://mark0.net/download/triddefs_xml.7z
> #		defs/p/pab.trid.xml
> #		defs/p/pst.trid.xml
> #		defs/p/pst-unicode.trid.xml
> #		defs/o/ost.trid.xml
> 
> The description happens inside  Magdir/Windows by lines like:
> 0	lelong	0x4E444221	Microsoft Outlook email folder
>> 10	leshort	0x0e		(<=2002)
>> 10	leshort	0x17		(>=2003)
> 
> After the test of starting 4 byte dwMagic !BDN the describing text is
> shown. By next 2 lines for two versions 14 and 23 year information is
> shown. These 2 version seems to be the common one. Then for unusual
> versions like example-v15.pst nothing year information is shown.
> 
> Unfortunately this version variable wVer is not clearly explained.
> It it is written that this value must be 14 (=Eh) or 15 (=Fh) if
> the file is an ANSI PST file. From version 21 (=15h according to
> non-official documentation) or value greater than 23 it is a
> Unicode PST file (UTF-16 little-endian) and highest mentioned value
> is 37. So this version information now becomes like:
> 
>>> 10	uleshort	x		(
>>> 10	leshort		<0x10		\b<=2002, ANSI,
>>> 10	leshort		>0x14		\b>=2003, Unicode,
>>> 10	uleshort	x		version %u)
> 
> In "newer" variant format has now become to Unicode, but also the
> size of some fields grow from 32-bit to 64-bit or meaning changed.
> So after the first twenty four bytes the fields also appear at
> other positions.
> 
> So for Unicode exist a branch with additional information, that
> looks like:
>>> 10	uleshort	>20
>>>> 184	ulequad	x		\b, %llu bytes
>>>> 513	ubyte	x		\b, bCryptMethod=%u
> The size of the file is stored as 8 byte integer variable
> ibFileEof. The variable bCryptMethod describes the Encryption type.
> Zero means no encryption. One is used for encryption with
> 'permutation algorithm'. Two is used for encryption with 'cyclic
> algorithm' and 16
> is used for encrypted with Windows Information Protection (WIP).
> For ANSI variant the same information is shown by branch which
> looks like:
> 
>>> 10	uleshort	<16
>>>> 168	ulelong	x		\b, %u bytes
>>>> 461	ubyte	x		\b, bCryptMethod=%u
> 
> The DROID samples x-fmt-75-signature-id-472.pab
> x-fmt-248-signature-id-260.pst x-fmt-249-signature-id-261.pst are
> not real Outlook examples. These contain just few dozen starting
> bytes of such outlook files. To skip these sample from
> misidentification just also test for existence of later field like
> bPlatformCreate value. So this additional part looks like:
>> 14	ubyte	x		Microsoft Outlook
> !:mime		application/vnd.ms-outlook
> Instead generic mime type application/octet-stream i display
> application/vnd.ms-outlook mentioned on reference site. But this
> not mentioned on other sites and is not official registered. So
> maybe this must be changed again.
> 
> The wMagicClient can be shown by line like:
>>> 8	leshort		x			\b, wMagicClient=%#x
> The string value AB (4142h) is used for PAB files. SM (534Dh) is
> used for PST files and SO (534Fh) is used for OST files. So
> depending on that value sub classification (with other type
> description and file name extension) is done. This now is expressed
> by lines like:
> 
>>> 8	leshort		0x4142		Personal Address Book
> !:ext	pab
>>> 8	leshort		0x4D53		Personal Storage
> !:ext	pst
>>> 8	leshort		0x4F53		Offline Storage
> !:ext	ost
> 
> After applying the above mentioned modifications by patch
> file-5.41-windows-pab.diff then the Outlook files are described
> with more details and misidentification vanish. This now looks like:
> 
> OL2003Password.pst:             Microsoft Outlook Personal Storage
> 				(>=2003, Unicode, version 23),
> 				dwUnique=0x17, 271360 bytes,
> 				bCryptMethod=1, CRC32 0xfc6a0096
> OL2003Password2.pst:            Microsoft Outlook Personal Storage
> 				(>=2003, Unicode, version 23),
> 				dwUnique=0x15, 271360 bytes,
> 				bCryptMethod=2, CRC32 0x6ba5f580
> Outlook-hj.pst:                 Microsoft Outlook Personal Storage
> 				(>=2003, Unicode, version 23),
> 				dwUnique=0x10f31, 556680192 bytes,
> 				bCryptMethod=1, CRC32 0x5de74682
> example-64bit.pst:              Microsoft Outlook Personal Storage
> 				(>=2003, Unicode, version 23),
> 				dwUnique=0x1d, 271360 bytes,
> 				CRC32 0x89cb68c4
> mailbox.PAB:                    Microsoft Outlook Personal Address
> 				Book
> 				(<=2002, ANSI, version 14),
> 				bPlatformCreate=2, bPlatformAccess=2,
> 				dwUnique=0x5, 32768 bytes
> outlook.pst:                    Microsoft Outlook Personal Storage
> 				(<=2002, ANSI, version 14),
> 				bPlatformCreate=2, bPlatformAccess=2,
> 				dwReserved1=0x8361a034,
> 				dwReserved2=0x373263,
> 				dwUnique=0x82, 278528 bytes,
> 				bCryptMethod=1
> test-ost.ost:                   Microsoft Outlook Offline Storage
> 				(<=2002, ANSI, version 15),
> 				dwUnique=0x4c08, 2556928 bytes,
> 				bCryptMethod=1
> test-v15.pst:                   Microsoft Outlook Personal Storage
> 				(<=2002, ANSI, version 15),
> 				dwUnique=0x4c08, 2556928 bytes,
> 				bCryptMethod=1
> test-v16.pst:                   Microsoft Outlook Personal Storage
> 				( version 16)
> test-v37.pst:                   Microsoft Outlook Personal Storage
> 				(>=2003, Unicode, version 37),
> 				dwUnique=0x400, 9 bytes,
> 				bSentinel=0x83, CRC32 0x58585858
> x-fmt-248-signature-id-260.pst: data
> x-fmt-249-signature-id-261.pst: data
> x-fmt-75-signature-id-472.pab:  data
> 
> I hope my diff file can be applied in future version of file
> utility.
> 
> With best wishes
> Jörg Jenderek
> - --
> Jörg Jenderek
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
> 
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYp5lpwAKCRCv8rHJQhrU
> 1iFnAJoCAJt+1KUwdcjrnZO/MnXZhHJDVwCeIMgnGziW6W1BfxMWsPh0CK2yvzk=
> =lpzG
> -----END PGP SIGNATURE-----
> <droid-outlook.csv.gz><trid-v-outlook.txt.gz><file-5_41-windows-pab_diff.DEFANGED-445><file-5_41-windows-pab_diff_sig.DEFANGED-446>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20220617/4281363d/attachment.asc>


More information about the File mailing list