[File] [PATCH] Magdir/ole2compounddocs for Microsoft PowerPoint Addin *.PPA and Wizard *.PWZ
Jörg Jenderek
joerg.jen.der.ek at gmx.net
Sat May 28 23:23:24 UTC 2022
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello,
some days ago i stalled an old Microsoft 97. Just for interest i
checked PowerPoint files from that version. When running file command
version 5.41 with -e cdf option on some examples i get an output like
:
AutoContent Wizard.pwz: OLE 2 Compound Document, v3.62,
SecID 0x1, 4 FAT sectors,
Mini FAT start sector 0x3,
2 Mini FAT sectors :
UNKNOWN, clsid
0xf04672810a72cf11871800aa0060263b
BSHPPT97.PPA: OLE 2 Compound Document, v3.62,
SecID 0x1,
Mini FAT start sector 0x3,
3 Mini FAT sectors :
UNKNOWN, clsid
0xf04672810a72cf11871800aa0060263b
Furthermore only generic mime type application/x-ole-storage is
shown with -i and -e cdf option. With option --extension only 3 byte
sequence ??? is shown.
When running file command with -e soft or no extra option for
inspected examples i get a output like:
AutoContent Wizard.pwz: Composite Document File V2 Document,
Little Endian, Os: Windows, Version 3.51,
Code page: 1252, Title: No Slide Title,
Author: Microsoft, Last Saved By: Microsoft,
Revision Number: 1,
Name of Creating Application:
Microsoft PowerPoint, Total Editing Time:
00:17, Create Time/Date: Mon Nov 4 13:00:18
1996, Last Saved Time/Date: Mon Nov 4
13:00:36 1996, Number of Words: 0
BSHPPT97.PPA: Composite Document File V2 Document,
Little Endian, Os: Windows, Version 3.51,
Code page: 1252,
Author: Microsoft, Last Saved By: Microsoft,
Revision Number: 1,
Name of Creating Application:
Microsoft PowerPoint, Total Editing Time:
00:06, Create Time/Date: Wed Oct 16 20:40:18
1996, Last Saved Time/Date: Wed Oct 16
20:40:24 1996, Number of Words: 0
For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). This identifies also
all examples with low priority as "Generic OLE2 / Multistream
Compound" by docfile.trid.xml. All examples are described as generic
"Microsoft PowerPoint document" by ppt.trid.xml. But it does not
recognize that it is an Addin or Wizard variant. So it shows wrong
extensions PPS/PPT (See appended trid-v-ppa.txt.gz).
For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/). This
identifies the examples generic as "OLE2 Compound Document Format"
by fmt/111 Signature (See appended droid-ppa.csv.gz).
Luckily with shown information i found hints about "Wizard" on
Microsoft PowerPoint page on Wikipedia site and on file extensions
web site. That informations are expressed by comment lines inside
Magdir/ole2compounddocs like:
# URL: https://www.file-extensions.org/ppa-file-extension
# https://en.wikipedia.org/
# wiki/Microsoft_PowerPoint#cite_note-231
# Reference: http://fileformats.archiveteam.org/
# wiki/Microsoft_Compound_File
The examples are recognized as "OLE 2 Compound Document"
by starting bytes (\320\317\021\340\241\261\032\341) at the beginning
inside Magdir/ole2compounddocs. Obviously there exist no code
fragment to do sub class identification. So the examples are
described as "UNKNOWN". Furthermore the examples have a registered
Root storage object CLSID. That value is shown as
0xf04672810a72cf11871800aa0060263b or expressed in standard curly
braces expression by {817246F0-720A-11CF-8718-00AA0060263B}.
That means that in branch handling non null CLSID GUID lines must be
added. The similar entry was Microsoft PowerPoint 97-2003
presentation or template (ppt/pps/pot). So i add afterwards lines for
my inspected examples. That looks like:
>>88 ubequad 0x871800aa0060263b : Microsoft
>>>80 ubequad 0xf04672810a72cf11 PowerPoint Addin or Wizard
!:mime application/vnd.ms-powerpoint
!:ext ppa/pwz
Instead of generic application/x-ole-storage these get the mime type
used by many other PowerPoint samples. The extension PPA is used for
the Addin variant like for example BSHPPT97.PPA and PWZ is used for
wizard variant like for example "AutoContent Wizard.pwz". According
to file extensions web site PWZ are exactly structurally identical to
the PPA file except for the fact that the extensions are different.
So i do not know how to distinguish. For both the second, third and
forth directory entries have names like VBA, PROJECT or PROJECTwm.
For my installation it was registered as PowerPoint.Wizard.8, when
following hints about wizard on Wikipedia this type exist for
PowerPoint version 4.0 to 11.0 (2004), but according to
file-extensions.org addin variant exist for version 97 to 2003.
After applying the above mentioned modifications by patch
file-ole2compounddocs-ppa.diff to newer master variant then all my
inspected examples are now described with more details. This now
looks with -e cdf option like:
AutoContent Wizard.pwz: OLE 2 Compound Document, v3.62,
SecID 0x1, 4 FAT sectors,
Mini FAT start sector 0x3,
2 Mini FAT sectors :
Microsoft PowerPoint Addin or Wizard
BSHPPT97.PPA: OLE 2 Compound Document, v3.62,
SecID 0x1,
Mini FAT start sector 0x3,
3 Mini FAT sectors :
Microsoft PowerPoint Addin or Wizard
I hope my diff file can be applied in future version of file
utility.
With best wishes,
Jörg Jenderek
- --
Jörg Jenderek
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYpKu1AAKCRCv8rHJQhrU
1pN+AKC+BDV7iwx2I/CU8HAiGgTy+mi1VACfa6L3wdvRL15x0GYEAyfw6bK+p74=
=IgIU
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: droid-ppa.csv.gz
Type: application/x-gzip
Size: 313 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220529/436707d8/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-ppa.txt.gz
Type: application/x-gzip
Size: 547 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220529/436707d8/attachment-0001.bin>
-------------- next part --------------
--- file-master/magic/Magdir/ole2compounddocs.old 2022-05-27 18:06:00.000000000 +0200
+++ file-master/magic/Magdir/ole2compounddocs 2022-05-28 23:56:10.197500800 +0200
@@ -436,12 +436,25 @@
>>>80 ubequad 0x108d81649b4fcf11 PowerPoint 97-2003 presentation or template
!:mime application/vnd.ms-powerpoint
!:apple ????PPT3
# /autostart/template
!:ext ppt/pps/pot
# From: Joerg Jenderek
+# URL: https://www.file-extensions.org/ppa-file-extension
+# https://en.wikipedia.org/wiki/Microsoft_PowerPoint#cite_note-231
+# Reference: http://fileformats.archiveteam.org/wiki/Microsoft_Compound_File
+>>88 ubequad 0x871800aa0060263b : Microsoft
+# only version 8 (97) tested; PowerPoint 4.0 to 11.0 (2004) (Wikipedia); 97 to 2003 (file-extensions.org)
+>>>80 ubequad 0xf04672810a72cf11 PowerPoint Addin or Wizard
+# second, third and fourth directory entry name like VBA PROJECT PROJECTwm
+# http://extension.nirsoft.net/pwz
+!:mime application/vnd.ms-powerpoint
+# like: BSHPPT97.PPA "AutoContent Wizard.pwz"
+!:ext ppa/pwz
+#
+# From: Joerg Jenderek
# URL: http://fileformats.archiveteam.org/wiki/AWD_(At_Work_Document)
# Reference: http://mark0.net/download/triddefs_xml.7z/defs/a/awd-fax.trid.xml
# Note: called "Microsoft At Work Fax document" by TrID
>>88 ubequad 0xb29400dd010f2bf9 : Microsoft
>>>80 ubequad 0x801cb0023de01a10 At Work fax Document
#!:mime application/x-ole-storage
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-ole2compounddocs-ppa.diff.sig
Type: application/octet-stream
Size: 928 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220529/436707d8/attachment.obj>
More information about the File
mailing list