[File] [PATCH] of Magdir/wordprocessors Corel DrawPerfect: Unknown filetype 10+15+16
Christos Zoulas
christos at zoulas.com
Sun Jan 1 16:49:34 UTC 2023
Committed, thanks and Happy New Year.
christos
> On Dec 31, 2022, at 11:22 AM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello,
>
> some days ago i send patch to recognize Microsoft Windows Installer
> transform scripts. These have file name suffix MST. Unfortunately
> this suffix, SHW and PRT is used for Corel Presentations. The newer
> variants are OLE 2 Compound Document based. These contain a stream
> with name PerfectOffice_MAIN. With suited tools like Michal Mutl
> Structured Storage Viewer you can extract and save these parts
> (*-stream.bin). These part looks like old variants.
> When running file command (version 5.44) on such Corel examples with
> option -e cdf i get an output like:
>
> Architektur-stream.bin: Corel DrawPerfect: Unknown filetype 16
> , v2.0
> Architektur.mst: OLE 2 Compound Document, v3.62,
> SecID 0x1,
> 10 FAT sectors, Mini FAT start sector 0x2
> : WordPerfect 7-X3
> presentations Master, Document or Graphic
> Coin Money-stream.bin: Corel DrawPerfect: Unknown filetype 16
> , v2.0
> Coin Money.mst: OLE 2 Compound Document, v3.62,
> SecID 0x1,
> 6 FAT sectors, Mini FAT start sector 0x2
> : WordPerfect 7-X3
> presentation
> FIG_ANIM.SHW: Corel DrawPerfect: Unknown filetype 15
> , v1.0
> WELCOME-stream.bin: Corel DrawPerfect: Unknown filetype 15
> , v2.0
> WELCOME.SHW: OLE 2 Compound Document, v3.62,
> SecID 0x1,
> 7 FAT sectors, Mini FAT start sector 0x2
> : WordPerfect 7-X3
> presentation
> WP-7-x3-conture-stream.bin: WordPerfect document, v2.0
> WP-7-x3-conture.wpd: OLE 2 Compound Document, v3.62,
> SecID 0x2,
> 0 Mini FAT sector
> : WordPerfect 7-X3
> presentations Master, Document or Graphic
> WP-7-x3-grafic-stream.bin: Corel DrawPerfect: Unknown filetype 16
> , v2.0
> WP-7-x3-grafic.wpg: OLE 2 Compound Document, v3.62,
> SecID 0x2,
> 0 Mini FAT sector
> : WordPerfect 7-X3
> presentations Master, Document or Graphic
> chartbar.shw: Corel DrawPerfect: Unknown filetype 10
> , v2.0
> fig-demo.shw: Corel DrawPerfect: Unknown filetype 10
> , v2.0
>
> With additional --extension option the following is displayed:
> Architektur-stream.bin: ???
> Architektur.mst: mst/wpd/wpg
> Coin Money-stream.bin: ???
> Coin Money.mst: shw
> FIG_ANIM.SHW: ???
> WELCOME-stream.bin: ???
> WELCOME.SHW: shw
> WP-7-x3-conture-stream.bin: wpd/wpt/wkb/icr/tut/sty/tst/crs
> WP-7-x3-conture.wpd: mst/wpd/wpg
> WP-7-x3-grafic-stream.bin: ???
> WP-7-x3-grafic.wpg: mst/wpd/wpg
> chartbar.shw: ???
> fig-demo.shw: ???
>
> With additional -i option the following is displayed:
> Architektur-stream.bin: application/octet-stream
> Architektur.mst: application/vnd.wordperfect
> Coin Money-stream.bin: application/octet-stream
> Coin Money.mst: application/x-corelpresentations
> FIG_ANIM.SHW: application/octet-stream
> WELCOME-stream.bin: application/octet-stream
> WELCOME.SHW: application/x-corelpresentations
> WP-7-x3-conture-stream.bin: application/vnd.wordperfect
> WP-7-x3-conture.wpd: application/vnd.wordperfect
> WP-7-x3-grafic-stream.bin: application/octet-stream
> WP-7-x3-grafic.wpg: application/vnd.wordperfect
> chartbar.shw: application/octet-stream
> fig-demo.shw: application/octet-stream
>
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). The samples (like
> chartbar.shw fig-demo.shw) which are described by file command with
> additional phrase "Unknown filetype 10" are described with highest
> rate as "WordPerfect Presentations (v2)" by shw-wp-2.trid.xml.
> The samples (like FIG_ANIM.SHW) which are described by file command
> with additional phrase "Unknown filetype 15" are described with
> highest rate as "WordPerfect/Corel Presentations (v3)" by
> shw-wp-3.trid.xml.
> The extracted streams which are described by file command often with
> additional phrase "Unknown filetype 16" are described with highest
> rate as "WordPerfect Document (generic)" by wpd-doc-gen.trid.xml or
> as "WordPerfect (generic)" by wp-generic.trid.xml.
> The samples (like Architektur.mst WP-7-x3-conture.wpd
> WP-7-x3-grafic.wpg WELCOME.SHW "Coin Money.mst") which are described
> by file command as "OLE 2 Compound Document" are with low rate
> described here as "Generic OLE2 / Multistream Compound" by
> docfile.trid.xml. With higher rate these samples are described as
> "WordPerfect Document (OLE2 Multistream Compound)" by
> wpd-docfile.trid.xml or as "WordPerfect Slide Show" by shw.trid.xml
> like example WELCOME.SHW (See appended trid-v-wp-mst.txt.gz).
>
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/). This
> identifies all OLE 2 Compound Documents only generic as "OLE2
> Compound Document" by PUID fmt/111.
> The samples (like FIG_ANIM.SHW) which are described by file command
> with additional phrase "Unknown filetype 15" are described as
> "Corel Presentation" with version "3" by PUID fmt/878.
> The remaining samples are described as "Corel Presentation" with
> version "7-8-9" by PUID fmt/877 via extension
> (See appended droid-wp-mst.csv.gz).
>
> TrID list the used file name extension and often with -v option the
> related URL pointing to some information. This is now expressed by
> comment lines inside Magdir/wordprocessors like:
> # URL: http://fileformats.archiveteam.org/
> # wiki/Corel_Presentations
> # Reference: http://mark0.net/download/triddefs_xml.7z
> # defs/s/shw-wp-2.trid.xml
> # defs/s/shw-wp-3.trid.xml
>
> The description happens inside Magdir/wordprocessors by starting like
> :
> 0 string \xffWPC
> So we see that the first 4 bytes are the generic magic for all
> WordPerfect samples. By bytes at offset 8 and 9 sub classification is
> done.
>
> For my Corel presentation examples this is at the moment done by
> lines like
>> 8 byte 15
>>> 9 default x
>>>> 9 byte x Corel DrawPerfect: Unknown filetype %d
> So there exist a second level classification but a third level (byte
> at offset 9) is missing and than the default clause for that branch
> is shown. So i only must insert parts for filetype 10, 15 and 16
> before that default clause.
>
> For type 10 these lines look like:
>
>>> 9 byte 10 WordPerfect Presentation
> !:mime application/x-drawperfect-shw
> !:ext shw
>>>> 4 ulelong !0x10 \b, at %#x document area
>>>> 12 ulelong !0 \b, at 0xC %#x
>
> At offset 4 normally the pointer to document areas is stored.
> According to TrID definition in this variant this values seems to be
> 10h and the value at offset 12 is nil, which probably means not
> decrypted. But i do not know if this always true. So show here only
> values for unexpected cases. Instead of generic mime type
> application/octet-stream i show an user defined one.
>
> For type 15 these lines look like:
>>> 9 byte 15 Corel DrawPerfect
> !:mime application/x-drawperfect-shw
> !:ext shw
>>>> 4 ulelong !0x1a \b, at %#x document area
>>>> 12 ulelong !0 \b, at 0xC %#x
>>>> 0x14 ulelong x \b, %u bytes
> At offset 4 normally the pointer to document areas is stored.
> According to TrID definition in this variant this values seems to be
> 1Ah and the value at offset 12 is nil, which probably means not
> decrypted. But i do not know if this always true. So show here only
> values for unexpected cases. At offset 20 the file size, not
> including pad characters at EOF is stored at 4 byte integer.
> Instead of generic mime type application/octet-stream i show an
> user defined one.
>
> For type 16 (embedded inside Compound Document variant) these lines
> look similar to 15 like:
>>> 9 byte 16 Corel Presentation (embedded)
> !:mime application/x-corelpresentations
> !:ext /
>>>> 4 ulelong !0x1a \b, at %#x document area 12 ulelong !0 \b,
>>>> at 0xC %#x 16 ulelong !0x3 \b, at 0x10 %#x
> Instead of generic mime type application/octet-stream i show here
> another user defined one. Because stream name is PerfectOffice_MAIN
> no name suffix is displayed here.
>
> After applying the above mentioned modifications by patch
> file-5.44-wordprocessors-shw.diff then i get a more precise output
> like:
>
> Architektur-stream.bin: Corel Presentation (embedded)
> , 629138 bytes
> , v2.0
> Architektur.mst: OLE 2 Compound Document, v3.62,
> SecID 0x1,
> 10 FAT sectors, Mini FAT start sector 0x2
> : WordPerfect 7-X3
> presentations Master, Document or Graphic
> Coin Money-stream.bin: Corel Presentation (embedded)
> , 377474 bytes
> , v2.0
> Coin Money.mst: OLE 2 Compound Document, v3.62,
> SecID 0x1,
> 6 FAT sectors, Mini FAT start sector 0x2
> : WordPerfect 7-X3
> presentation
> FIG_ANIM.SHW: Corel Presentation
> , 360289 bytes
> , v1.0
> WELCOME-stream.bin: Corel Presentation
> , 424882 bytes
> , v2.0
> WELCOME.SHW: OLE 2 Compound Document, v3.62,
> SecID 0x1,
> 7 FAT sectors, Mini FAT start sector 0x2
> : WordPerfect 7-X3
> presentation
> WP-7-x3-conture-stream.bin: WordPerfect document
> , v2.0
> WP-7-x3-conture.wpd: OLE 2 Compound Document, v3.62,
> SecID 0x2,
> 0 Mini FAT sector
> : WordPerfect 7-X3
> presentations Master, Document or Graphic
> WP-7-x3-grafic-stream.bin: Corel Presentation (embedded)
> , 629138 bytes
> , v2.0
> WP-7-x3-grafic.wpg: OLE 2 Compound Document, v3.62,
> SecID 0x2,
> 0 Mini FAT sector
> : WordPerfect 7-X3
> presentations Master, Document or Graphic
> chartbar.shw: WordPerfect Presentation
> , v2.0
> fig-demo.shw: WordPerfect Presentation
> , v2.0
>
> With additional -i option now i get output like:
> Architektur-stream.bin: application/x-corelpresentations
> Architektur.mst: application/vnd.wordperfect
> Coin Money-stream.bin: application/x-corelpresentations
> Coin Money.mst: application/x-corelpresentations
> FIG_ANIM.SHW: application/x-drawperfect-shw
> WELCOME-stream.bin: application/x-drawperfect-shw
> WELCOME.SHW: application/x-corelpresentations
> WP-7-x3-conture-stream.bin: application/vnd.wordperfect
> WP-7-x3-conture.wpd: application/vnd.wordperfect
> WP-7-x3-grafic-stream.bin: application/x-corelpresentations
> WP-7-x3-grafic.wpg: application/vnd.wordperfect
> chartbar.shw: application/x-drawperfect-shw
> fig-demo.shw: application/x-drawperfect-shw
>
> With additional --extension option now i get:
> Architektur-stream.bin: /
> Architektur.mst: mst/wpd/wpg
> Coin Money-stream.bin: /
> Coin Money.mst: shw
> FIG_ANIM.SHW: shw
> WELCOME-stream.bin: shw
> WELCOME.SHW: shw
> WP-7-x3-conture-stream.bin: wpd/wpt/wkb/icr/tut/sty/tst/crs
> WP-7-x3-conture.wpd: mst/wpd/wpg
> WP-7-x3-grafic-stream.bin: /
> WP-7-x3-grafic.wpg: mst/wpd/wpg
> chartbar.shw: shw
> fig-demo.shw: shw
>
> I hope my diff file can be applied in future version of
> file utility.
>
> I also have a feature request. It would be nice if similar to mime
> type for DROID identification a line inside definition could be
> added. That maybe
> look like:
> !:PUID fmt/867
> Then this information should be shown by file command. Even more
> nice would be if this expanded to an URL like:
> https://www.nationalarchives.gov.uk/PRONOM/fmt/867
> It would be nice if similar to mime type for TrID identification a
> line inside definition could be added. That maybe look like:
> !:TrID shw-wp-3.trid.xml
>
> Why do i think this is important? A similar situation exist for
> anti virus software. Every company use an own naming scheme. So
> when you have trouble with some suspicious file example, you must
> consult in doubt dozen different pages of anti virus software to
> get a solid meaning about the inspected file format. For file
> identifying tools we have the same problems. I use 3 different tools:
> file
> trid
> droid
> Every of these tools has advantages and disadvantages. Or in other
> words none is the global and universal truth showing tool.
>
> With best wishes
> Jörg Jenderek
> - --
> Jörg Jenderek
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCY7BhvgAKCRCv8rHJQhrU
> 1gHeAKCHo7Pr7Cx5cHMhYoRJltDq0AfnLQCdGD2m1CRcjjSsl5xCBT2KFBVW1/o=
> =Ty6g
> -----END PGP SIGNATURE-----
> <trid-v-wp-mst.txt.gz><droid-wp-mst.csv.gz><file-5_44-wordprocessors-shw_diff.DEFANGED-0><file-5_44-wordprocessors-shw_diff_sig.DEFANGED-1>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20230101/d5d3798c/attachment.asc>
More information about the File
mailing list