[File] [PATCH] Magdir/ole2compounddocs for "newer" Adobe PageMaker
Christos Zoulas
christos at zoulas.com
Mon Jan 17 17:02:47 UTC 2022
Committed, thanks!
christos
> On Jan 11, 2022, at 3:24 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello,
>
> some days ago i send patch for "older" Aldus/Adobe PageMaker
> documents, which is accepted and now included inside
> Magdir/wordprocessors. Now i check "newer" Adobe PageMaker documents.
> The documents and templates are files with file name extensions
> like PM6 P65 PMD PT6 T65 PMT.
>
> When running file command version 5.41 with -e cdf option on such
> documents i get an output like:
>
> 02TEMPLT.T65: OLE 2 Compound Document, v3.62, SecID 0,
> 2 FAT sectors,
> 0 Mini FAT sector :
> UNKNOWN with names PageMaker
> Charset.pmt: OLE 2 Compound Document, v3.62, SecID 0x66,
> 0 Mini FAT sector :
> UNKNOWN with names PageMaker
> MyPage6.PM6: OLE 2 Compound Document, v3.62, SecID 0x1,
> 0 Mini FAT sector :
> UNKNOWN with names PageMaker
> brochus.pt6: OLE 2 Compound Document, v3.62, SecID 0x1,
> 0 Mini FAT sector :
> UNKNOWN with names PageMaker
> pm-70.pmd: OLE 2 Compound Document, v3.62, SecID 0,
> 0 Mini FAT sector :
> UNKNOWN with names PageMaker
> strategies.p65: OLE 2 Compound Document, v3.62, SecID 0,
> 24 FAT sectors,
> Mini FAT start sector 0x2a,
> 25 Mini FAT sectors :
> UNKNOWN with names PageMaker ObjectPool 1
>
> Furthermore with -i option only generic application/CDFV2 is shown.
> With -i and -e cdf option mime type application/x-ole-storage is
> shown. With option --extension only 3 byte sequence ??? is shown.
>
> No oficial mime type come from Microsoft. Blame on them. But at
> least according to FreeDesktop.org shared MIME database
> "application/x-ole-storage" seems to be the most common used.
> This information can also be found on reposcope.com website.
> So i think the file command should also use this term or at least use
> the same term when using soft or cdf magic. So i changed in current
> src/readcdf.c this mime type. That looked like:
> } else if (ms->flags & MAGIC_MIME_TYPE) {
> if (file_printf(ms, "application/CDFV2") == -1)
> return -1;
> }
> When running file command with -e soft or no extra option for all
> examples i get a generic line like:
> Composite Document File V2 Document, Cannot read section info
>
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). This identifies also
> all examples with low priority as "Generic OLE2 / Multistream
> Compound" by docfile.trid.xml. Most examples are described as "Adobe
> PageMaker document (generic)" with mime type application/x-pagemaker
> by pagemaker-generic.trid.xml. The examples are described often also
> as "Adobe PageMaker document (v6)" by pagemaker-pm6.trid.xml, "Adobe
> PageMaker document (v6.5)" by pagemaker-pm65.trid.xml and "Page Maker
> 7 Document" by pmd-pm7.trid.xml without correct version
> differentiation. So also mentioned 3 filename extensions PM6, P65 and
> PMD are not in right order. Furthermore here the file name extensions
> for templates (PT6 T65 PMT) with character T are also missing (See
> appended trid-v-pagemaker-new.txt.gz).
>
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/). This
> identifies all new pagemaker examples as "Pagemaker Document
> (Generic)" with mime type application/vnd.pagemaker by PUID fmt/876.
> But it only shows 2 extensions PMD and PMT (See appended
> DROID-pagemaker-new.csv.gz)
>
> Luckily i also found a page about PageMaker on file formats archive
> team web site. That informations are about the "old" variants and
> also the "new" variants. That informations are expressed by comment
> lines inside Magdir/ole2compounddocs like:
> # URL: http://fileformats.archiveteam.org/wiki/PageMaker
> # Reference: http://mark0.net/download/triddefs_xml.7z/defs/p
> # pagemaker-generic.trid.xml
> # pagemaker-pm6.trid.xml
> # pagemaker-pm65.trid.xml
> # pmd-pm7.trid.xml
>
> The Pagemaker documents are recognized as "OLE 2 Compound Document"
> by starting bytes (\320\317\021\340\241\261\032\341) at the beginning
> inside Magdir/ole2compounddocs. Obviously there exist no code
> fragment to do sub class identification. So the examples are
> described as "UNKNOWN". Furthermore the examples have no registered
> Root storage object CLSID or this value is nil. In that case file
> command would display afterwords this information by a phrase like
> ", clsid 0xc0c7266eb98cd311a1c800c04f612452". That means that in
> branch handling CLSID GUID 0 code must be added. The last entry was
> for SoftMaker Presentations or template (*.prd *.prv) with pictures.
>
> So i add afterwards lines for my inspected examples. Luckily file
> command print some directory entry names. In all examples this is
> word "PageMaker" encoded as UTF-16. This characteristic is also
> found in global string section inside TrID definition by line like:
> <String>P'A'G'E'M'A'K'E'R</String>
> When i extract this stream for example by Michal Mutl Structured
> Storage Viewer i get real pagemaker content in "old" format. This
> is also described in the documentation and these parts are
> recognised by Magdir/wordprocessors. So by first additional line i
> look for second directory entry with UTF-16 encoded name PageMaker.
> That looks like:
>>>>> 128 lestring16 PageMaker :
>
> In second step i must jump to stream part. Maybe there exist more
> efficient or better ways, but i do brute force looking for start
> magic of "old" PageMaker by line like:
>>>>>> 0 search/0xa900/s \0\0\0\0\0\0\xff\x99
> In third step i handle this stream part by lines like:
> #>>>>>>&0 use PageMaker
>>>>>>> &0 indirect x
> I first tried to call directly sub routine PageMaker from
> Magdir/wordprocessors, but then i get wrong version. Maybe this is
> bug in file command. When i use instead the indirect directive i
> get correct identifications. But i also get an ugly side effect.
> Afterwards an additional unexpected phrase UNKNOWN0000000000000000
> is displayed.
>
> This was triggered by part for remaining non nil clsid. That was
> done by lines like:
>>> 88 default x : UNKNOWN
>>>> 80 ubequad !0 \b, clsid %#16.16llx
>>>> 88 ubequad x \b%16.16llx
> This should not happen! I do not know what is wrong here. So i
> check again for non nil GUID. So this now becomes like:
>>> 88 default x
>>>> 88 ubequad !0 : UNKNOWN
>>>>> 80 ubequad !0 \b, clsid %#16.16llx
>>>>> 88 ubequad x \b%16.16llx
>
> After applying the above mentioned modifications by patch
> file-5.41-ole2compounddocs-pagemaker.diff,
> file-5.41-readcdf-mime.diff and using newest Magdir/wordprocessors
> then all my inspected "newer" PageMaker documents are now described
> with more details. This now looks with -e cdf option like:
> 02TEMPLT.T65: OLE 2 Compound Document, v3.62, SecID 0,
> 2 FAT sectors,
> 0 Mini FAT sector :
> Adobe PageMaker document, little-endian, version 6.50
> Charset.pmt: OLE 2 Compound Document, v3.62, SecID 0x66,
> 0 Mini FAT sector :
> Adobe PageMaker document, little-endian, version 6.50
> MyPage6.PM6: OLE 2 Compound Document, v3.62, SecID 0x1,
> 0 Mini FAT sector :
> Adobe PageMaker document, little-endian, version 6
> brochus.pt6: OLE 2 Compound Document, v3.62, SecID 0x1,
> 0 Mini FAT sector :
> Adobe PageMaker document, little-endian, version 6
> pm-70.pmd: OLE 2 Compound Document, v3.62, SecID 0,
> 0 Mini FAT sector :
> Adobe PageMaker document, little-endian, version 6.50
> strategies.p65: OLE 2 Compound Document, v3.62, SecID 0,
> 24 FAT sectors,
> Mini FAT start sector 0x2a,
> 25 Mini FAT sectors :
> Adobe PageMaker document, little-endian, version 6.50
>
> With -e cdf and --extension option this now looks like:
> 02TEMPLT.T65: p65/t65/pmd/pmt
> Charset.pmt: p65/t65/pmd/pmt
> MyPage6.PM6: pm6/pt6
> brochus.pt6: pm6/pt6
> pm-70.pmd: p65/t65/pmd/pmt
> strategies.p65: p65/t65/pmd/pmt
>
> I hope my diff files can be applied in future version of file
> utility. So unfortunately no ways are described and found by myself
> to distinguish templates with other file name extensions from pure
> PageMaker publications. Also i found no way to distinguish version
> 6.5 (*.P65 *.T65) from version 7 (*.PMD *.PMT).
>
> Check the facts as far as you can. Listen to what scientists and
> the experts of the departments recommend. Accordingly, the vaccine
> is the most suited measure against Corona. Anyone who believes in
> Fake news, also storms as MOB the Capitol, mocks science and
> terrorizes the to silent majority of the population. Stay healthy.
>
> Jörg Jenderek
> - --
> Jörg Jenderek
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYd3nXQAKCRCv8rHJQhrU
> 1nJVAKDWay4r61LNcGvLo/8tNO2b8R/SvgCeIelamPiKS+QVYX0dR78c8xiXUBg=
> =JjJT
> -----END PGP SIGNATURE-----
> <file-5_41-readcdf-mime_diff.DEFANGED-103><file-5_41-readcdf-mime_diff_sig.DEFANGED-104><DROID-pagemaker-new.csv.gz><file-5_41-ole2compounddocs-pagemaker_diff.DEFANGED-105><file-5_41-ole2compounddocs-pagemaker_diff_sig.DEFANGED-106><trid-v-pagemaker-new.txt.gz>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20220117/ed79ae4e/attachment.asc>
More information about the File
mailing list