[File] [PATCH] of Magdir/wordprocessors Corel DrawPerfect: Unknown filetype 10+15+16

Christos Zoulas christos at zoulas.com
Sun Jan 1 16:49:34 UTC 2023


Committed, thanks and Happy New Year.

christos

> On Dec 31, 2022, at 11:22 AM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hello,
> 
> some days ago i send patch to recognize Microsoft Windows Installer
> transform scripts. These have file name suffix MST. Unfortunately
> this suffix, SHW and PRT is used for Corel Presentations. The newer
> variants are OLE 2 Compound Document based. These contain a stream
> with name PerfectOffice_MAIN. With suited tools like Michal Mutl
> Structured Storage Viewer you can extract and save these parts
> (*-stream.bin). These part looks like old variants.
> When running file command (version 5.44) on such Corel examples with
> option -e cdf i get an output like:
> 
> Architektur-stream.bin:     Corel DrawPerfect: Unknown filetype 16
> 			    , v2.0
> Architektur.mst:            OLE 2 Compound Document, v3.62,
> 			    SecID 0x1,
> 			    10 FAT sectors, Mini FAT start sector 0x2
> 			    : WordPerfect 7-X3
> 			    presentations Master, Document or Graphic
> Coin Money-stream.bin:      Corel DrawPerfect: Unknown filetype 16
>     			    , v2.0
> Coin Money.mst:             OLE 2 Compound Document, v3.62,
>     			    SecID 0x1,
> 			    6 FAT sectors, Mini FAT start sector 0x2
> 			    : WordPerfect 7-X3
> 			    presentation
> FIG_ANIM.SHW:               Corel DrawPerfect: Unknown filetype 15
> 			    , v1.0
> WELCOME-stream.bin:         Corel DrawPerfect: Unknown filetype 15
> 			    , v2.0
> WELCOME.SHW:                OLE 2 Compound Document, v3.62,
> 			    SecID 0x1,
> 			    7 FAT sectors, Mini FAT start sector 0x2
> 			    : WordPerfect 7-X3
> 			    presentation
> WP-7-x3-conture-stream.bin: WordPerfect document, v2.0
> WP-7-x3-conture.wpd:        OLE 2 Compound Document, v3.62,
> 			    SecID 0x2,
> 			    0 Mini FAT sector
> 			    : WordPerfect 7-X3
> 			    presentations Master, Document or Graphic
> WP-7-x3-grafic-stream.bin:  Corel DrawPerfect: Unknown filetype 16
> 			    , v2.0
> WP-7-x3-grafic.wpg:         OLE 2 Compound Document, v3.62,
> 			    SecID 0x2,
> 			    0 Mini FAT sector
> 			    : WordPerfect 7-X3
> 			    presentations Master, Document or Graphic
> chartbar.shw:               Corel DrawPerfect: Unknown filetype 10
> 			    , v2.0
> fig-demo.shw:               Corel DrawPerfect: Unknown filetype 10
> 			    , v2.0
> 
> With additional --extension option the following is displayed:
> Architektur-stream.bin:     ???
> Architektur.mst:            mst/wpd/wpg
> Coin Money-stream.bin:      ???
> Coin Money.mst:             shw
> FIG_ANIM.SHW:               ???
> WELCOME-stream.bin:         ???
> WELCOME.SHW:                shw
> WP-7-x3-conture-stream.bin: wpd/wpt/wkb/icr/tut/sty/tst/crs
> WP-7-x3-conture.wpd:        mst/wpd/wpg
> WP-7-x3-grafic-stream.bin:  ???
> WP-7-x3-grafic.wpg:         mst/wpd/wpg
> chartbar.shw:               ???
> fig-demo.shw:               ???
> 
> With additional -i option the following is displayed:
> Architektur-stream.bin:     application/octet-stream
> Architektur.mst:            application/vnd.wordperfect
> Coin Money-stream.bin:      application/octet-stream
> Coin Money.mst:             application/x-corelpresentations
> FIG_ANIM.SHW:               application/octet-stream
> WELCOME-stream.bin:         application/octet-stream
> WELCOME.SHW:                application/x-corelpresentations
> WP-7-x3-conture-stream.bin: application/vnd.wordperfect
> WP-7-x3-conture.wpd:        application/vnd.wordperfect
> WP-7-x3-grafic-stream.bin:  application/octet-stream
> WP-7-x3-grafic.wpg:         application/vnd.wordperfect
> chartbar.shw:               application/octet-stream
> fig-demo.shw:               application/octet-stream
> 
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). The samples (like
> chartbar.shw fig-demo.shw) which are described by file command with
> additional phrase "Unknown filetype 10" are described with highest
> rate as "WordPerfect Presentations (v2)" by shw-wp-2.trid.xml.
> The samples (like FIG_ANIM.SHW) which are described by file command
> with additional phrase "Unknown filetype 15" are described with
> highest rate as "WordPerfect/Corel Presentations (v3)" by
> shw-wp-3.trid.xml.
> The extracted streams which are described by file command often with
> additional phrase "Unknown filetype 16" are described with highest
> rate as "WordPerfect Document (generic)" by wpd-doc-gen.trid.xml or
> as "WordPerfect (generic)" by wp-generic.trid.xml.
> The samples (like Architektur.mst WP-7-x3-conture.wpd
> WP-7-x3-grafic.wpg WELCOME.SHW "Coin Money.mst") which are described
> by file command as "OLE 2 Compound Document" are with low rate
> described here as "Generic OLE2 / Multistream Compound" by
> docfile.trid.xml. With higher rate these samples are described as
> "WordPerfect Document (OLE2 Multistream Compound)" by
> wpd-docfile.trid.xml or as "WordPerfect Slide Show" by shw.trid.xml
> like example WELCOME.SHW (See appended trid-v-wp-mst.txt.gz).
> 
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/). This
> identifies all OLE 2 Compound Documents only generic as "OLE2
> Compound Document" by PUID fmt/111.
> The samples (like FIG_ANIM.SHW) which are described by file command
> with additional phrase "Unknown filetype 15" are described as
> "Corel Presentation" with version "3" by PUID fmt/878.
> The remaining samples are described as "Corel Presentation" with
> version "7-8-9" by PUID fmt/877 via extension
> (See appended droid-wp-mst.csv.gz).
> 
> TrID list the used file name extension and often with -v option the
> related URL pointing to some information. This is now expressed by
> comment lines inside Magdir/wordprocessors like:
> # URL:		http://fileformats.archiveteam.org/
> #		wiki/Corel_Presentations
> # Reference:	http://mark0.net/download/triddefs_xml.7z
> #		defs/s/shw-wp-2.trid.xml
> #		defs/s/shw-wp-3.trid.xml
> 
> The description happens inside Magdir/wordprocessors by starting like
> :
> 0	string	\xffWPC
> So we see that the first 4 bytes are the generic magic for all
> WordPerfect samples. By bytes at offset 8 and 9 sub classification is
> done.
> 
> For my Corel presentation examples this is at the moment done by
> lines like
>> 8	byte	15
>>> 9	default	x
>>>> 9	byte	x	Corel DrawPerfect: Unknown filetype %d
> So there exist a second level classification but a third level (byte
> at offset 9) is missing and than the default clause for that branch
> is shown. So i only must insert parts for filetype 10, 15 and 16
> before that default clause.
> 
> For type 10 these lines look like:
> 
>>> 9	byte	10	WordPerfect Presentation
> !:mime		application/x-drawperfect-shw
> !:ext		shw
>>>> 4	ulelong	!0x10	\b, at %#x document area
>>>> 12	ulelong	!0	\b, at 0xC %#x
> 
> At offset 4 normally the pointer to document areas is stored.
> According to TrID definition in this variant this values seems to be
> 10h and the value at offset 12 is nil, which probably means not
> decrypted. But i do not know if this always true. So show here only
> values for unexpected cases. Instead of generic mime type
> application/octet-stream i show an user defined one.
> 
> For type 15 these lines look like:
>>> 9	byte	15	Corel DrawPerfect
> !:mime		application/x-drawperfect-shw
> !:ext		shw
>>>> 4	ulelong	!0x1a	\b, at %#x document area
>>>> 12	ulelong	!0	\b, at 0xC %#x
>>>> 0x14	ulelong x	\b, %u bytes
> At offset 4 normally the pointer to document areas is stored.
> According to TrID definition in this variant this values seems to be
> 1Ah and the value at offset 12 is nil, which probably means not
> decrypted. But i do not know if this always true. So show here only
> values for unexpected cases. At offset 20 the file size, not
> including pad characters at EOF is stored at 4 byte integer.
> Instead of generic mime type application/octet-stream i show an
> user defined one.
> 
> For type 16 (embedded inside Compound Document variant) these lines
> look similar to 15 like:
>>> 9	byte	16	Corel Presentation (embedded)
> !:mime		application/x-corelpresentations
> !:ext		/
>>>> 4	ulelong	!0x1a	\b, at %#x document area 12	ulelong	!0	\b,
>>>> at 0xC %#x 16	ulelong	!0x3	\b, at 0x10 %#x
> Instead of generic mime type application/octet-stream i show here
> another user defined one. Because stream name is PerfectOffice_MAIN
> no name suffix is displayed here.
> 
> After applying the above mentioned modifications by patch
> file-5.44-wordprocessors-shw.diff then i get a more precise output
> like:
> 
> Architektur-stream.bin:     Corel Presentation (embedded)
> 			    , 629138 bytes
> 			    , v2.0
> Architektur.mst:            OLE 2 Compound Document, v3.62,
> 			    SecID 0x1,
> 			    10 FAT sectors, Mini FAT start sector 0x2
> 			    : WordPerfect 7-X3
> 			    presentations Master, Document or Graphic
> Coin Money-stream.bin:      Corel Presentation (embedded)
>     			    , 377474 bytes
> 			    , v2.0
> Coin Money.mst:             OLE 2 Compound Document, v3.62,
>     			    SecID 0x1,
> 			    6 FAT sectors, Mini FAT start sector 0x2
> 			    : WordPerfect 7-X3
> 			    presentation
> FIG_ANIM.SHW:               Corel Presentation
> 			    , 360289 bytes
> 			    , v1.0
> WELCOME-stream.bin:         Corel Presentation
> 			    , 424882 bytes
> 			    , v2.0
> WELCOME.SHW:                OLE 2 Compound Document, v3.62,
> 			    SecID 0x1,
> 			    7 FAT sectors, Mini FAT start sector 0x2
> 			    : WordPerfect 7-X3
> 			    presentation
> WP-7-x3-conture-stream.bin: WordPerfect document
> 			    , v2.0
> WP-7-x3-conture.wpd:        OLE 2 Compound Document, v3.62,
> 			    SecID 0x2,
> 			    0 Mini FAT sector
> 			    : WordPerfect 7-X3
> 			    presentations Master, Document or Graphic
> WP-7-x3-grafic-stream.bin:  Corel Presentation (embedded)
> 			    , 629138 bytes
> 			    , v2.0
> WP-7-x3-grafic.wpg:         OLE 2 Compound Document, v3.62,
> 			    SecID 0x2,
> 			    0 Mini FAT sector
> 			    : WordPerfect 7-X3
> 			    presentations Master, Document or Graphic
> chartbar.shw:               WordPerfect Presentation
> 			    , v2.0
> fig-demo.shw:               WordPerfect Presentation
> 			    , v2.0
> 
> With additional -i option now i get output like:
> Architektur-stream.bin:     application/x-corelpresentations
> Architektur.mst:            application/vnd.wordperfect
> Coin Money-stream.bin:      application/x-corelpresentations
> Coin Money.mst:             application/x-corelpresentations
> FIG_ANIM.SHW:               application/x-drawperfect-shw
> WELCOME-stream.bin:         application/x-drawperfect-shw
> WELCOME.SHW:                application/x-corelpresentations
> WP-7-x3-conture-stream.bin: application/vnd.wordperfect
> WP-7-x3-conture.wpd:        application/vnd.wordperfect
> WP-7-x3-grafic-stream.bin:  application/x-corelpresentations
> WP-7-x3-grafic.wpg:         application/vnd.wordperfect
> chartbar.shw:               application/x-drawperfect-shw
> fig-demo.shw:               application/x-drawperfect-shw
> 
> With additional --extension option now i get:
> Architektur-stream.bin:     /
> Architektur.mst:            mst/wpd/wpg
> Coin Money-stream.bin:      /
> Coin Money.mst:             shw
> FIG_ANIM.SHW:               shw
> WELCOME-stream.bin:         shw
> WELCOME.SHW:                shw
> WP-7-x3-conture-stream.bin: wpd/wpt/wkb/icr/tut/sty/tst/crs
> WP-7-x3-conture.wpd:        mst/wpd/wpg
> WP-7-x3-grafic-stream.bin:  /
> WP-7-x3-grafic.wpg:         mst/wpd/wpg
> chartbar.shw:               shw
> fig-demo.shw:               shw
> 
> I hope my diff file can be applied in future version of
> file utility.
> 
> I also have a feature request. It would be nice if similar to mime
> type for DROID identification a line inside definition could be
> added. That maybe
> look like:
> !:PUID	fmt/867
> Then this information should be shown by file command. Even more
> nice would be if this expanded to an URL like:
> 	https://www.nationalarchives.gov.uk/PRONOM/fmt/867
> It would be nice if similar to mime type for TrID identification a
> line inside definition could be added. That maybe look like:
> !:TrID	shw-wp-3.trid.xml
> 
> Why do i think this is important? A similar situation exist for
> anti virus software. Every company use an own naming scheme. So
> when you have trouble with some suspicious file example, you must
> consult in doubt dozen different pages of anti virus software to
> get a solid meaning about the inspected file format. For file
> identifying tools we have the same problems. I use 3 different tools:
> 	file
> 	trid
> 	droid
> Every of these tools has advantages and disadvantages. Or in other
> words none is the global and universal truth showing tool.
> 
> With best wishes
> Jörg Jenderek
> - --
> Jörg Jenderek
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
> 
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCY7BhvgAKCRCv8rHJQhrU
> 1gHeAKCHo7Pr7Cx5cHMhYoRJltDq0AfnLQCdGD2m1CRcjjSsl5xCBT2KFBVW1/o=
> =Ty6g
> -----END PGP SIGNATURE-----
> <trid-v-wp-mst.txt.gz><droid-wp-mst.csv.gz><file-5_44-wordprocessors-shw_diff.DEFANGED-0><file-5_44-wordprocessors-shw_diff_sig.DEFANGED-1>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20230101/d5d3798c/attachment.asc>


More information about the File mailing list