[File] [PATCH] Magdir/ole2compounddocs for Microsoft PowerPoint Addin *.PPA and Wizard *.PWZ

Jörg Jenderek joerg.jen.der.ek at gmx.net
Sat May 28 23:23:24 UTC 2022


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

some days ago i stalled an old Microsoft 97. Just for interest i
checked PowerPoint files from that version. When running file command
version 5.41 with -e cdf option on some examples i get an output like
:

AutoContent Wizard.pwz: OLE 2 Compound Document, v3.62,
	    		SecID 0x1, 4 FAT sectors,
			Mini FAT start sector 0x3,
			2 Mini FAT sectors :
			UNKNOWN, clsid
			0xf04672810a72cf11871800aa0060263b
BSHPPT97.PPA:           OLE 2 Compound Document, v3.62,
			SecID 0x1,
			Mini FAT start sector 0x3,
			3 Mini FAT sectors :
			UNKNOWN, clsid
			0xf04672810a72cf11871800aa0060263b

Furthermore only generic mime type application/x-ole-storage is
shown with -i and -e cdf option. With option --extension only 3 byte
sequence ??? is shown.

When running file command with -e soft or no extra option for
inspected examples i get a output like:

AutoContent Wizard.pwz: Composite Document File V2 Document,
	    		Little Endian, Os: Windows, Version 3.51,
			Code page: 1252, Title: No Slide Title,
			Author: Microsoft, Last Saved By: Microsoft,
			Revision Number: 1,
			Name of Creating Application:
			Microsoft PowerPoint, Total Editing Time:
			00:17, Create Time/Date: Mon Nov  4 13:00:18
			1996, Last Saved Time/Date: Mon Nov  4
			13:00:36 1996, Number of Words: 0
BSHPPT97.PPA:           Composite Document File V2 Document,
			Little Endian, Os: Windows, Version 3.51,
			Code page: 1252,
			Author: Microsoft, Last Saved By: Microsoft,
			Revision Number: 1,
			Name of Creating Application:
			Microsoft PowerPoint, Total Editing Time:
			00:06, Create Time/Date: Wed Oct 16 20:40:18
			1996, Last Saved Time/Date: Wed Oct 16
			20:40:24 1996, Number of Words: 0

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). This identifies also
all examples with low priority as "Generic OLE2 / Multistream
Compound" by docfile.trid.xml. All examples are described as generic
"Microsoft PowerPoint document" by ppt.trid.xml. But it does not
recognize that it is an Addin or Wizard variant. So it shows wrong
extensions PPS/PPT (See appended trid-v-ppa.txt.gz).

For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/). This
identifies the examples generic as "OLE2 Compound Document Format"
by fmt/111 Signature (See appended droid-ppa.csv.gz).

Luckily with shown information i found hints about "Wizard" on
Microsoft PowerPoint page on Wikipedia site and on file extensions
web site. That informations are expressed by comment lines inside
Magdir/ole2compounddocs like:

# URL:		https://www.file-extensions.org/ppa-file-extension
#		https://en.wikipedia.org/
#		wiki/Microsoft_PowerPoint#cite_note-231
# Reference:	http://fileformats.archiveteam.org/
#		wiki/Microsoft_Compound_File

The examples are recognized as "OLE 2 Compound Document"
by starting bytes (\320\317\021\340\241\261\032\341) at the beginning
inside Magdir/ole2compounddocs. Obviously there exist no code
fragment to do sub class identification. So the examples are
described as "UNKNOWN". Furthermore the examples have a registered
Root storage object CLSID. That value is shown as
0xf04672810a72cf11871800aa0060263b or expressed in standard curly
braces expression by {817246F0-720A-11CF-8718-00AA0060263B}.
That means that in branch handling non null CLSID GUID lines must be
added. The similar entry was Microsoft PowerPoint 97-2003
presentation or template (ppt/pps/pot). So i add afterwards lines for
my inspected examples. That looks like:

 >>88 	ubequad	0x871800aa0060263b	: Microsoft
 >>>80 	ubequad	0xf04672810a72cf11	PowerPoint Addin or Wizard
 !:mime	application/vnd.ms-powerpoint
 !:ext	ppa/pwz

Instead of generic application/x-ole-storage these get the mime type
used by many other PowerPoint samples. The extension PPA is used for
the Addin variant like for example BSHPPT97.PPA and PWZ is used for
wizard variant like for example "AutoContent Wizard.pwz". According
to file extensions web site PWZ are exactly structurally identical to
the PPA file except for the fact that the extensions are different.
So i do not know how to distinguish. For both the second, third and
forth directory entries have names like VBA, PROJECT or PROJECTwm.

For my installation it was registered as PowerPoint.Wizard.8, when
following hints about wizard on Wikipedia this type exist for
PowerPoint version 4.0 to 11.0 (2004), but according to
file-extensions.org addin variant exist for version 97 to 2003.

After applying the above mentioned modifications by patch
file-ole2compounddocs-ppa.diff to newer master variant then all my
inspected examples are now described with more details. This now
looks with -e cdf option like:

AutoContent Wizard.pwz: OLE 2 Compound Document, v3.62,
	    		SecID 0x1, 4 FAT sectors,
			Mini FAT start sector 0x3,
			2 Mini FAT sectors :
			Microsoft PowerPoint Addin or Wizard
BSHPPT97.PPA:           OLE 2 Compound Document, v3.62,
			SecID 0x1,
			Mini FAT start sector 0x3,
			3 Mini FAT sectors :
			Microsoft PowerPoint Addin or Wizard

I hope my diff file can be applied in future version of file
utility.

With best wishes,
Jörg Jenderek
- --
Jörg Jenderek




-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYpKu1AAKCRCv8rHJQhrU
1pN+AKC+BDV7iwx2I/CU8HAiGgTy+mi1VACfa6L3wdvRL15x0GYEAyfw6bK+p74=
=IgIU
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: droid-ppa.csv.gz
Type: application/x-gzip
Size: 313 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220529/436707d8/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-ppa.txt.gz
Type: application/x-gzip
Size: 547 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220529/436707d8/attachment-0001.bin>
-------------- next part --------------
--- file-master/magic/Magdir/ole2compounddocs.old	2022-05-27 18:06:00.000000000 +0200
+++ file-master/magic/Magdir/ole2compounddocs	2022-05-28 23:56:10.197500800 +0200
@@ -436,12 +436,25 @@
 >>>80 	ubequad		0x108d81649b4fcf11	PowerPoint 97-2003 presentation or template
 !:mime	application/vnd.ms-powerpoint
 !:apple	????PPT3
 # /autostart/template
 !:ext	ppt/pps/pot
 # From:		Joerg Jenderek
+# URL:		https://www.file-extensions.org/ppa-file-extension
+#		https://en.wikipedia.org/wiki/Microsoft_PowerPoint#cite_note-231
+# Reference:	http://fileformats.archiveteam.org/wiki/Microsoft_Compound_File
+>>88 	ubequad		0x871800aa0060263b	: Microsoft
+# only version 8 (97) tested; PowerPoint 4.0 to 11.0 (2004) (Wikipedia); 97 to 2003 (file-extensions.org)
+>>>80 	ubequad		0xf04672810a72cf11	PowerPoint Addin or Wizard
+# second, third and fourth directory entry name like VBA PROJECT PROJECTwm
+# http://extension.nirsoft.net/pwz
+!:mime	application/vnd.ms-powerpoint
+# like: BSHPPT97.PPA "AutoContent Wizard.pwz"
+!:ext	ppa/pwz
+#
+# From:		Joerg Jenderek
 # URL:		http://fileformats.archiveteam.org/wiki/AWD_(At_Work_Document)
 # Reference:	http://mark0.net/download/triddefs_xml.7z/defs/a/awd-fax.trid.xml
 # Note:		called "Microsoft At Work Fax document" by TrID
 >>88 	ubequad		0xb29400dd010f2bf9	: Microsoft
 >>>80 	ubequad		0x801cb0023de01a10	At Work fax Document
 #!:mime	application/x-ole-storage
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-ole2compounddocs-ppa.diff.sig
Type: application/octet-stream
Size: 928 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220529/436707d8/attachment.obj>


More information about the File mailing list