[File] [PATCH] Magdir/ole2compounddocs for "newer" Adobe PageMaker

Jörg Jenderek joerg.jen.der.ek at gmx.net
Tue Jan 11 20:24:19 UTC 2022


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

some days ago i send patch for "older" Aldus/Adobe PageMaker
documents, which is accepted and now included inside
Magdir/wordprocessors. Now i check "newer" Adobe PageMaker documents.
The documents and templates are files with file name extensions
like PM6 P65 PMD PT6 T65 PMT.

When running file command version 5.41 with -e cdf option on such
documents i get an output like:

02TEMPLT.T65:   OLE 2 Compound Document, v3.62, SecID 0,
		2 FAT sectors,
		0 Mini FAT sector :
		UNKNOWN with names PageMaker
Charset.pmt:    OLE 2 Compound Document, v3.62, SecID 0x66,
		0 Mini FAT sector :
		UNKNOWN with names PageMaker
MyPage6.PM6:    OLE 2 Compound Document, v3.62, SecID 0x1,
		0 Mini FAT sector :
		UNKNOWN with names PageMaker
brochus.pt6:    OLE 2 Compound Document, v3.62, SecID 0x1,
		0 Mini FAT sector :
		UNKNOWN with names PageMaker
pm-70.pmd:      OLE 2 Compound Document, v3.62, SecID 0,
		0 Mini FAT sector :
		UNKNOWN with names PageMaker
strategies.p65: OLE 2 Compound Document, v3.62, SecID 0,
		24 FAT sectors,
		Mini FAT start sector 0x2a,
		25 Mini FAT sectors :
		UNKNOWN with names PageMaker ObjectPool 1

Furthermore with -i option only generic application/CDFV2 is shown.
With -i and -e cdf option mime type application/x-ole-storage is
shown. With option --extension only 3 byte sequence ??? is shown.

No oficial mime type come from Microsoft. Blame on them. But at
least according to FreeDesktop.org shared MIME database
"application/x-ole-storage" seems to be the most common used.
This information can also be found on reposcope.com website.
So i think the file command should also use this term or at least use
the same term when using soft or cdf magic. So i changed in current
src/readcdf.c this mime type. That looked like:
	} else if (ms->flags & MAGIC_MIME_TYPE) {
		if (file_printf(ms, "application/CDFV2") == -1)
			return -1;
	}
When running file command with -e soft or no extra option for all
examples i get a generic line like:
Composite Document File V2 Document, Cannot read section info

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). This identifies also
all examples with low priority as "Generic OLE2 / Multistream
Compound" by docfile.trid.xml. Most examples are described as "Adobe
PageMaker document (generic)" with mime type application/x-pagemaker
by pagemaker-generic.trid.xml. The examples are described often also
as "Adobe PageMaker document (v6)" by pagemaker-pm6.trid.xml, "Adobe
PageMaker document (v6.5)" by pagemaker-pm65.trid.xml and "Page Maker
7 Document" by pmd-pm7.trid.xml without correct version
differentiation. So also mentioned 3 filename extensions PM6, P65 and
PMD are not in right order. Furthermore here the file name extensions
for templates (PT6 T65 PMT) with character T are also missing (See
appended trid-v-pagemaker-new.txt.gz).

For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/). This
identifies all new pagemaker examples as "Pagemaker Document
(Generic)" with mime type application/vnd.pagemaker by PUID fmt/876.
But it only shows 2 extensions PMD and PMT (See appended
DROID-pagemaker-new.csv.gz)

Luckily i also found a page about PageMaker on file formats archive
team web site. That informations are about the "old" variants and
also the "new" variants. That informations are expressed by comment
lines inside Magdir/ole2compounddocs like:
# URL:		http://fileformats.archiveteam.org/wiki/PageMaker
# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/p
#		pagemaker-generic.trid.xml
#		pagemaker-pm6.trid.xml
#		pagemaker-pm65.trid.xml
#		pmd-pm7.trid.xml

The Pagemaker documents are recognized as "OLE 2 Compound Document"
by starting bytes (\320\317\021\340\241\261\032\341) at the beginning
inside Magdir/ole2compounddocs. Obviously there exist no code
fragment to do sub class identification. So the examples are
described as "UNKNOWN". Furthermore the examples have no registered
Root storage object CLSID or this value is nil. In that case file
command would display afterwords this information by a phrase like
", clsid 0xc0c7266eb98cd311a1c800c04f612452". That means that in
branch handling CLSID GUID 0 code must be added. The last entry was
for SoftMaker Presentations or template (*.prd *.prv) with pictures.

So i add afterwards lines for my inspected examples. Luckily file
command print some directory entry names. In all examples this is
word "PageMaker" encoded as UTF-16. This characteristic is also
found in global string section inside TrID definition by line like:
		<String>P'A'G'E'M'A'K'E'R</String>
When i extract this stream for example by Michal Mutl Structured
Storage Viewer i get real pagemaker content in "old" format. This
is also described in the documentation and these parts are
recognised by Magdir/wordprocessors. So by first additional line i
look for second directory entry with UTF-16 encoded name PageMaker.
That looks like:
 >>>> 128 	lestring16	PageMaker	:

In second step i must jump to stream part. Maybe there exist more
efficient or better ways, but i do brute force looking for start
magic of "old" PageMaker by line like:
 >>>>> 0	search/0xa900/s	\0\0\0\0\0\0\xff\x99
In third step i handle this stream part by lines like:
 #>>>>>>&0	use		PageMaker
 >>>>>> &0	indirect	x
I first tried to call directly sub routine PageMaker from
Magdir/wordprocessors, but then i get wrong version. Maybe this is
bug in file command. When i use instead the indirect directive i
get correct identifications. But i also get an ugly side effect.
Afterwards an additional unexpected phrase UNKNOWN0000000000000000
is displayed.

This was triggered by part for remaining non nil clsid. That was
done by lines like:
 >>88 	default		x		: UNKNOWN
 >>>80 	ubequad		!0		\b, clsid %#16.16llx
 >>>88 	ubequad		x		\b%16.16llx
This should not happen! I do not know what is wrong here. So i
check again for non nil GUID. So this now becomes like:
 >>88 	default		x
 >>>88 	ubequad		!0		: UNKNOWN
 >>>>80 ubequad		!0		\b, clsid %#16.16llx
 >>>>88 ubequad		x		\b%16.16llx

After applying the above mentioned modifications by patch
file-5.41-ole2compounddocs-pagemaker.diff,
file-5.41-readcdf-mime.diff and using newest Magdir/wordprocessors
then all my inspected "newer" PageMaker documents are now described
with more details. This now looks with -e cdf option like:
02TEMPLT.T65:   OLE 2 Compound Document, v3.62, SecID 0,
		2 FAT sectors,
		0 Mini FAT sector :
		Adobe PageMaker document, little-endian, version 6.50
Charset.pmt:    OLE 2 Compound Document, v3.62, SecID 0x66,
		0 Mini FAT sector :
		Adobe PageMaker document, little-endian, version 6.50
MyPage6.PM6:    OLE 2 Compound Document, v3.62, SecID 0x1,
		0 Mini FAT sector :
		Adobe PageMaker document, little-endian, version 6
brochus.pt6:    OLE 2 Compound Document, v3.62, SecID 0x1,
		0 Mini FAT sector :
		Adobe PageMaker document, little-endian, version 6
pm-70.pmd:      OLE 2 Compound Document, v3.62, SecID 0,
		0 Mini FAT sector :
		Adobe PageMaker document, little-endian, version 6.50
strategies.p65: OLE 2 Compound Document, v3.62, SecID 0,
		24 FAT sectors,
		Mini FAT start sector 0x2a,
		25 Mini FAT sectors :
		Adobe PageMaker document, little-endian, version 6.50

With -e cdf and --extension option this now looks like:
02TEMPLT.T65:   p65/t65/pmd/pmt
Charset.pmt:    p65/t65/pmd/pmt
MyPage6.PM6:    pm6/pt6
brochus.pt6:    pm6/pt6
pm-70.pmd:      p65/t65/pmd/pmt
strategies.p65: p65/t65/pmd/pmt

I hope my diff files can be applied in future version of file
utility. So unfortunately no ways are described and found by myself
to distinguish templates with other file name extensions from pure
PageMaker publications. Also i found no way to distinguish version
6.5 (*.P65 *.T65) from version 7 (*.PMD *.PMT).

Check the facts as far as you can. Listen to what scientists and
the experts of the departments recommend. Accordingly, the vaccine
is the most suited measure against Corona. Anyone who believes in
Fake news, also storms as MOB the Capitol, mocks science and
terrorizes the to silent majority of the population. Stay healthy.

Jörg Jenderek
- --
Jörg Jenderek
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYd3nXQAKCRCv8rHJQhrU
1nJVAKDWay4r61LNcGvLo/8tNO2b8R/SvgCeIelamPiKS+QVYX0dR78c8xiXUBg=
=JjJT
-----END PGP SIGNATURE-----
-------------- next part --------------
--- file-5.41/src/readcdf.c.old	2019-09-30 17:42:50.000000000 +0200
+++ file-5.41/src/readcdf.c	2022-01-09 22:17:06.936913800 +0100
@@ -675,5 +675,6 @@
 				return -1;
 	} else if (ms->flags & MAGIC_MIME_TYPE) {
-		if (file_printf(ms, "application/CDFV2") == -1)
+		/* https://reposcope.com/mimetype/application/x-ole-storage */
+		if (file_printf(ms, "application/x-ole-storage") == -1)
 			return -1;
 	}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.41-readcdf-mime.diff.sig
Type: application/octet-stream
Size: 412 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220111/7ef2d03c/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DROID-pagemaker-new.csv.gz
Type: application/x-gzip
Size: 475 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220111/7ef2d03c/attachment.bin>
-------------- next part --------------
--- file-5.41/magic/Magdir/ole2compounddocs.old	2021-09-07 09:39:31 +0000
+++ file-5.41/magic/Magdir/ole2compounddocs	2022-01-11 19:53:37 +0000
@@ -260,2 +260,20 @@
 >>>>>>128 	lestring16	Pictures		with pictures
+#
+# URL:		http://fileformats.archiveteam.org/wiki/PageMaker
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/p
+#		pagemaker-generic.trid.xml
+#		pagemaker-pm6.trid.xml
+#		pagemaker-pm65.trid.xml
+#		pmd-pm7.trid.xml
+# From:		Joerg Jenderek
+# Note:		since version 6 embedd as stream with PageMaker name the "old" format handled by ./wordprocessors
+#		verified by Michal Mutl Structured Storage Viewer `SSView.exe brochus.pt6`
+# Second directory entry name PageMaker
+>>>>128 	lestring16	PageMaker		:
+# look for magic of "old" PageMaker like in 02TEMPLT.T65
+>>>>>0	search/0xa900/s	\0\0\0\0\0\0\xff\x99
+# GRR: jump to PageMaker stream and inspect it by sub routine PageMaker of ./wordprocessors failed with wrong version!
+#>>>>>>&0	use		PageMaker
+# THIS WORKS PARTLY!
+>>>>>>&0	indirect	x
 #	remaining null clsid
@@ -269,2 +287,5 @@
 !:mime	application/x-ole-storage
+# according to file version 5.41 with -e soft option
+#!:mime	application/CDFV2
+#!:ext	???
 #	look for known clsid GUID
@@ -563,6 +584,12 @@
 # remaining non null clsid
->>88 	default		x			: UNKNOWN
+>>88 	default		x
+# GRR: check again for non null clsid because wrong when called by indirect directive
+>>>88 	ubequad		!0			: UNKNOWN
+# https://reposcope.com/mimetype/application/x-ole-storage
 !:mime	application/x-ole-storage
->>>80 	ubequad		!0			\b, clsid %#16.16llx
->>>88 	ubequad		x			\b%16.16llx
+# according to file version 5.41 with -e soft option
+#!:mime	application/CDFV2
+#!:ext	???
+>>>>80 	ubequad		!0			\b, clsid %#16.16llx
+>>>>88 	ubequad		x			\b%16.16llx
 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.41-ole2compounddocs-pagemaker.diff.sig
Type: application/octet-stream
Size: 1044 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220111/7ef2d03c/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-pagemaker-new.txt.gz
Type: application/x-gzip
Size: 843 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220111/7ef2d03c/attachment-0001.bin>


More information about the File mailing list