[File] [PATCH] of Magdir/wordprocessors Corel DrawPerfect: Unknown filetype 10+15+16

Jörg Jenderek joerg.jen.der.ek at gmx.net
Sat Dec 31 16:22:23 UTC 2022


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

some days ago i send patch to recognize Microsoft Windows Installer
transform scripts. These have file name suffix MST. Unfortunately
this suffix, SHW and PRT is used for Corel Presentations. The newer
variants are OLE 2 Compound Document based. These contain a stream
with name PerfectOffice_MAIN. With suited tools like Michal Mutl
Structured Storage Viewer you can extract and save these parts
(*-stream.bin). These part looks like old variants.
When running file command (version 5.44) on such Corel examples with
option -e cdf i get an output like:

Architektur-stream.bin:     Corel DrawPerfect: Unknown filetype 16
			    , v2.0
Architektur.mst:            OLE 2 Compound Document, v3.62,
			    SecID 0x1,
			    10 FAT sectors, Mini FAT start sector 0x2
			    : WordPerfect 7-X3
			    presentations Master, Document or Graphic
Coin Money-stream.bin:      Corel DrawPerfect: Unknown filetype 16
     			    , v2.0
Coin Money.mst:             OLE 2 Compound Document, v3.62,
     			    SecID 0x1,
			    6 FAT sectors, Mini FAT start sector 0x2
			    : WordPerfect 7-X3
			    presentation
FIG_ANIM.SHW:               Corel DrawPerfect: Unknown filetype 15
			    , v1.0
WELCOME-stream.bin:         Corel DrawPerfect: Unknown filetype 15
			    , v2.0
WELCOME.SHW:                OLE 2 Compound Document, v3.62,
			    SecID 0x1,
			    7 FAT sectors, Mini FAT start sector 0x2
			    : WordPerfect 7-X3
			    presentation
WP-7-x3-conture-stream.bin: WordPerfect document, v2.0
WP-7-x3-conture.wpd:        OLE 2 Compound Document, v3.62,
			    SecID 0x2,
			    0 Mini FAT sector
			    : WordPerfect 7-X3
			    presentations Master, Document or Graphic
WP-7-x3-grafic-stream.bin:  Corel DrawPerfect: Unknown filetype 16
			    , v2.0
WP-7-x3-grafic.wpg:         OLE 2 Compound Document, v3.62,
			    SecID 0x2,
			    0 Mini FAT sector
			    : WordPerfect 7-X3
			    presentations Master, Document or Graphic
chartbar.shw:               Corel DrawPerfect: Unknown filetype 10
			    , v2.0
fig-demo.shw:               Corel DrawPerfect: Unknown filetype 10
			    , v2.0

With additional --extension option the following is displayed:
Architektur-stream.bin:     ???
Architektur.mst:            mst/wpd/wpg
Coin Money-stream.bin:      ???
Coin Money.mst:             shw
FIG_ANIM.SHW:               ???
WELCOME-stream.bin:         ???
WELCOME.SHW:                shw
WP-7-x3-conture-stream.bin: wpd/wpt/wkb/icr/tut/sty/tst/crs
WP-7-x3-conture.wpd:        mst/wpd/wpg
WP-7-x3-grafic-stream.bin:  ???
WP-7-x3-grafic.wpg:         mst/wpd/wpg
chartbar.shw:               ???
fig-demo.shw:               ???

With additional -i option the following is displayed:
Architektur-stream.bin:     application/octet-stream
Architektur.mst:            application/vnd.wordperfect
Coin Money-stream.bin:      application/octet-stream
Coin Money.mst:             application/x-corelpresentations
FIG_ANIM.SHW:               application/octet-stream
WELCOME-stream.bin:         application/octet-stream
WELCOME.SHW:                application/x-corelpresentations
WP-7-x3-conture-stream.bin: application/vnd.wordperfect
WP-7-x3-conture.wpd:        application/vnd.wordperfect
WP-7-x3-grafic-stream.bin:  application/octet-stream
WP-7-x3-grafic.wpg:         application/vnd.wordperfect
chartbar.shw:               application/octet-stream
fig-demo.shw:               application/octet-stream

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). The samples (like
chartbar.shw fig-demo.shw) which are described by file command with
additional phrase "Unknown filetype 10" are described with highest
rate as "WordPerfect Presentations (v2)" by shw-wp-2.trid.xml.
The samples (like FIG_ANIM.SHW) which are described by file command
with additional phrase "Unknown filetype 15" are described with
highest rate as "WordPerfect/Corel Presentations (v3)" by
shw-wp-3.trid.xml.
The extracted streams which are described by file command often with
additional phrase "Unknown filetype 16" are described with highest
rate as "WordPerfect Document (generic)" by wpd-doc-gen.trid.xml or
as "WordPerfect (generic)" by wp-generic.trid.xml.
The samples (like Architektur.mst WP-7-x3-conture.wpd
WP-7-x3-grafic.wpg WELCOME.SHW "Coin Money.mst") which are described
by file command as "OLE 2 Compound Document" are with low rate
described here as "Generic OLE2 / Multistream Compound" by
docfile.trid.xml. With higher rate these samples are described as
"WordPerfect Document (OLE2 Multistream Compound)" by
wpd-docfile.trid.xml or as "WordPerfect Slide Show" by shw.trid.xml
like example WELCOME.SHW (See appended trid-v-wp-mst.txt.gz).

For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/). This
identifies all OLE 2 Compound Documents only generic as "OLE2
Compound Document" by PUID fmt/111.
The samples (like FIG_ANIM.SHW) which are described by file command
with additional phrase "Unknown filetype 15" are described as
"Corel Presentation" with version "3" by PUID fmt/878.
The remaining samples are described as "Corel Presentation" with
version "7-8-9" by PUID fmt/877 via extension
(See appended droid-wp-mst.csv.gz).

TrID list the used file name extension and often with -v option the
related URL pointing to some information. This is now expressed by
comment lines inside Magdir/wordprocessors like:
# URL:		http://fileformats.archiveteam.org/
#		wiki/Corel_Presentations
# Reference:	http://mark0.net/download/triddefs_xml.7z
#		defs/s/shw-wp-2.trid.xml
#		defs/s/shw-wp-3.trid.xml

The description happens inside Magdir/wordprocessors by starting like
:
 0	string	\xffWPC
So we see that the first 4 bytes are the generic magic for all
WordPerfect samples. By bytes at offset 8 and 9 sub classification is
done.

For my Corel presentation examples this is at the moment done by
lines like
 >8	byte	15
 >>9	default	x
 >>>9	byte	x	Corel DrawPerfect: Unknown filetype %d
So there exist a second level classification but a third level (byte
at offset 9) is missing and than the default clause for that branch
is shown. So i only must insert parts for filetype 10, 15 and 16
before that default clause.

For type 10 these lines look like:

 >>9	byte	10	WordPerfect Presentation
 !:mime		application/x-drawperfect-shw
 !:ext		shw
 >>>4	ulelong	!0x10	\b, at %#x document area
 >>>12	ulelong	!0	\b, at 0xC %#x

At offset 4 normally the pointer to document areas is stored.
According to TrID definition in this variant this values seems to be
10h and the value at offset 12 is nil, which probably means not
decrypted. But i do not know if this always true. So show here only
values for unexpected cases. Instead of generic mime type
application/octet-stream i show an user defined one.

For type 15 these lines look like:
 >>9	byte	15	Corel DrawPerfect
 !:mime		application/x-drawperfect-shw
 !:ext		shw
 >>>4	ulelong	!0x1a	\b, at %#x document area
 >>>12	ulelong	!0	\b, at 0xC %#x
 >>>0x14	ulelong x	\b, %u bytes
At offset 4 normally the pointer to document areas is stored.
According to TrID definition in this variant this values seems to be
1Ah and the value at offset 12 is nil, which probably means not
decrypted. But i do not know if this always true. So show here only
values for unexpected cases. At offset 20 the file size, not
including pad characters at EOF is stored at 4 byte integer.
Instead of generic mime type application/octet-stream i show an
user defined one.

For type 16 (embedded inside Compound Document variant) these lines
look similar to 15 like:
>> 9	byte	16	Corel Presentation (embedded)
!:mime		application/x-corelpresentations
!:ext		/
>>> 4	ulelong	!0x1a	\b, at %#x document area 12	ulelong	!0	\b,
>>> at 0xC %#x 16	ulelong	!0x3	\b, at 0x10 %#x
Instead of generic mime type application/octet-stream i show here
another user defined one. Because stream name is PerfectOffice_MAIN
no name suffix is displayed here.

After applying the above mentioned modifications by patch
file-5.44-wordprocessors-shw.diff then i get a more precise output
like:

Architektur-stream.bin:     Corel Presentation (embedded)
			    , 629138 bytes
			    , v2.0
Architektur.mst:            OLE 2 Compound Document, v3.62,
			    SecID 0x1,
			    10 FAT sectors, Mini FAT start sector 0x2
			    : WordPerfect 7-X3
			    presentations Master, Document or Graphic
Coin Money-stream.bin:      Corel Presentation (embedded)
     			    , 377474 bytes
			    , v2.0
Coin Money.mst:             OLE 2 Compound Document, v3.62,
     			    SecID 0x1,
			    6 FAT sectors, Mini FAT start sector 0x2
			    : WordPerfect 7-X3
			    presentation
FIG_ANIM.SHW:               Corel Presentation
			    , 360289 bytes
			    , v1.0
WELCOME-stream.bin:         Corel Presentation
			    , 424882 bytes
			    , v2.0
WELCOME.SHW:                OLE 2 Compound Document, v3.62,
			    SecID 0x1,
			    7 FAT sectors, Mini FAT start sector 0x2
			    : WordPerfect 7-X3
			    presentation
WP-7-x3-conture-stream.bin: WordPerfect document
			    , v2.0
WP-7-x3-conture.wpd:        OLE 2 Compound Document, v3.62,
			    SecID 0x2,
			    0 Mini FAT sector
			    : WordPerfect 7-X3
			    presentations Master, Document or Graphic
WP-7-x3-grafic-stream.bin:  Corel Presentation (embedded)
			    , 629138 bytes
			    , v2.0
WP-7-x3-grafic.wpg:         OLE 2 Compound Document, v3.62,
			    SecID 0x2,
			    0 Mini FAT sector
			    : WordPerfect 7-X3
			    presentations Master, Document or Graphic
chartbar.shw:               WordPerfect Presentation
			    , v2.0
fig-demo.shw:               WordPerfect Presentation
			    , v2.0

With additional -i option now i get output like:
Architektur-stream.bin:     application/x-corelpresentations
Architektur.mst:            application/vnd.wordperfect
Coin Money-stream.bin:      application/x-corelpresentations
Coin Money.mst:             application/x-corelpresentations
FIG_ANIM.SHW:               application/x-drawperfect-shw
WELCOME-stream.bin:         application/x-drawperfect-shw
WELCOME.SHW:                application/x-corelpresentations
WP-7-x3-conture-stream.bin: application/vnd.wordperfect
WP-7-x3-conture.wpd:        application/vnd.wordperfect
WP-7-x3-grafic-stream.bin:  application/x-corelpresentations
WP-7-x3-grafic.wpg:         application/vnd.wordperfect
chartbar.shw:               application/x-drawperfect-shw
fig-demo.shw:               application/x-drawperfect-shw

With additional --extension option now i get:
Architektur-stream.bin:     /
Architektur.mst:            mst/wpd/wpg
Coin Money-stream.bin:      /
Coin Money.mst:             shw
FIG_ANIM.SHW:               shw
WELCOME-stream.bin:         shw
WELCOME.SHW:                shw
WP-7-x3-conture-stream.bin: wpd/wpt/wkb/icr/tut/sty/tst/crs
WP-7-x3-conture.wpd:        mst/wpd/wpg
WP-7-x3-grafic-stream.bin:  /
WP-7-x3-grafic.wpg:         mst/wpd/wpg
chartbar.shw:               shw
fig-demo.shw:               shw

I hope my diff file can be applied in future version of
file utility.

I also have a feature request. It would be nice if similar to mime
type for DROID identification a line inside definition could be
added. That maybe
look like:
!:PUID	fmt/867
Then this information should be shown by file command. Even more
nice would be if this expanded to an URL like:
	https://www.nationalarchives.gov.uk/PRONOM/fmt/867
It would be nice if similar to mime type for TrID identification a
line inside definition could be added. That maybe look like:
!:TrID	shw-wp-3.trid.xml

Why do i think this is important? A similar situation exist for
anti virus software. Every company use an own naming scheme. So
when you have trouble with some suspicious file example, you must
consult in doubt dozen different pages of anti virus software to
get a solid meaning about the inspected file format. For file
identifying tools we have the same problems. I use 3 different tools:
	file
	trid
	droid
Every of these tools has advantages and disadvantages. Or in other
words none is the global and universal truth showing tool.

With best wishes
Jörg Jenderek
- --
Jörg Jenderek
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCY7BhvgAKCRCv8rHJQhrU
1gHeAKCHo7Pr7Cx5cHMhYoRJltDq0AfnLQCdGD2m1CRcjjSsl5xCBT2KFBVW1/o=
=Ty6g
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-wp-mst.txt.gz
Type: application/x-gzip
Size: 1202 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20221231/7426b200/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: droid-wp-mst.csv.gz
Type: application/x-gzip
Size: 712 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20221231/7426b200/attachment-0001.bin>
-------------- next part --------------
--- file-5.44/magic/Magdir/wordprocessors.old	2022-11-30 00:04:06.000000000 +0100
+++ file-5.44/magic/Magdir/wordprocessors	2022-12-31 13:59:56.269924200 +0100
@@ -290,3 +290,61 @@
 # Corel DrawPerfect
+# URL:		http://fileformats.archiveteam.org/wiki/Corel_Presentations
+# Update:	Joerg Jenderek
 >8	byte	15
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/s/shw-wp-2.trid.xml
+# Note:		called "WordPerfect Presentations (v2)" by TrID and
+#		"Corel Presentation" with version "7-8-9" by DROID via PUID fmt/877
+>>9	byte	10	WordPerfect Presentation
+#!:mime		application/octet-stream
+#!:mime		application/vnd.wordperfect
+!:mime		application/x-drawperfect-shw
+# like: BENEFITS.SHW chartbar.shw chartbul.shw chartgal.shw chartorg.shw fig-demo.shw figurgal.shw mastrgal.shw scuba.shw tutorial.shw
+!:ext		shw
+# pointer to document area like: 10h
+>>>4	ulelong	!0x10	\b, at %#x document area
+# according to TrID this is nil
+>>>12	ulelong	!0	\b, at 0xC %#x
+# search for embedded WP file like in tutorial.shw
+#>>>16	search/638/sb	\xffWPC	WPC_MAGIC_FOUND
+# GRR: indirect call leads to recursion! WHY?
+#>>>>&0	indirect	x	\b; contains
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/s/shw-wp-3.trid.xml
+# Note:		called "WordPerfect/Corel Presentations (v3)" by TrID and
+#		"Corel Presentation" with version "3" by DROID via PUID fmt/878
+>>9	byte	15	Corel Presentation
+#!:mime		application/octet-stream
+#!:mime		application/vnd.wordperfect
+!:mime		application/x-drawperfect-shw
+# like: FIG_ANIM.SHW presenta.shw
+!:ext		shw
+# pointer to document area like: 1ah
+>>>4	ulelong	!0x1a	\b, at %#x document area
+# according to TrID this is nil
+>>>12	ulelong	!0	\b, at 0xC %#x
+# reserved like: 3
+>>>16	ulelong	!0x3	\b, at 0x10 %#x
+# file size, not including pad characters at EOF
+>>>0x14	ulelong x	\b, %u bytes
+# search for embedded WP file like in foo
+#>>>24	search/638/sb	\xffWPC	WPC_MAGIC_FOUND
+# GRR: indirect call leads to recursion! WHY?
+#>>>>&0	indirect	x	\b; contains
+# embedded inside Compound Document variant handled by ./ole2compounddocs
+>>9	byte	16	Corel Presentation (embeded)
+#!:mime		application/octet-stream
+#!:mime		application/vnd.wordperfect
+!:mime		application/x-corelpresentations
+# like: PerfectOffice_MAIN
+!:ext		/
+# pointer to document area like: 1ah
+>>>4	ulelong	!0x1a	\b, at %#x document area
+>>>12	ulelong	!0	\b, at 0xC %#x
+# reserved like: 3
+>>>16	ulelong	!0x3	\b, at 0x10 %#x
+# file size, not including pad characters at EOF
+>>>0x14	ulelong x	\b, %u bytes
+# search for embedded WP file
+#>>>24	search/638/sb	\xffWPC	WPC_MAGIC_FOUND
+# GRR: indirect call leads to recursion! WHY?
+#>>>>&0	indirect	x	\b; contains
 >>9	default	x
@@ -380,3 +438,6 @@
 >10	byte	0	\b, v5.
+# version of WP file; 2.1~WP 8.0
+# major version of WP file like: 1 2
 >10	byte	!0	\b, v%d.
+# minor version of WP file like: 0 1
 >11	byte	x	\b%d
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.44-wordprocessors-shw.diff.sig
Type: application/octet-stream
Size: 1145 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20221231/7426b200/attachment.obj>


More information about the File mailing list