[File] [PATCH] of Magdir/ole2compounddocs for Microsoft Works portfolio *.WSB + WordPerfect document *.WPD

Jörg Jenderek joerg.jen.der.ek at gmx.net
Tue Aug 31 23:45:08 UTC 2021


Hello,
some months ago i send patches to handle OLE 2 Compound Document.
Some days ago i handle some Microsoft Works portfolio with file name
extension wbs and a variant of WordPerfect document with WPD
extension. These are based on OLE 2 Compound Document format.
When running file command version 5.40 on such examples
with keep-going option i get an output like:

ole6-PerfectOffice_MAIN.wpd: WordPerfect document, v2.2
			     (Lepton 2.x), scale 54367-4096,
			     spot sensor temperature 0.000000,
			     unit celsius, color scheme 0,
			     minimum point enabled, calibration:
			     offset -0.000000, slope
			     0.000000
ole6.wpd:                    Composite Document File V2 Document,
			     Cannot read section info
			     OLE 2 Compound Document, v3.62,
			     SecID 0x2, 2 FAT sectors,
			     0 Mini FAT sector : UNKNOWN,
			     clsid 0xff739851ad2d200219370000929679cd
			     (Lepton 2.x), scale 54367-4096,
			     spot sensor temperature 0.000000,
			     unit celsius, color scheme 0,
			     minimum point enabled, calibration:
			     offset -0.000000, slope
			     -4412865511424.000000
Sammlung.wsb:                Composite Document File V2 Document,
			     Cannot read section info
			     OLE 2 Compound Document, v3.62,
			     SecID 0x1, 13 FAT sectors,
			     Mini FAT start sector 0x2 : UNKNOWN,
			     clsid 0xc0c7266eb98cd311a1c800c04f612452
wsbsamp.wsb:                 Composite Document File V2 Document,
			     Cannot read section info
			     OLE 2 Compound Document, v3.62,
			     SecID 0x1, 9 FAT sectors,
			     Mini FAT start sector 0x2 : UNKNOWN,
			     clsid 0xc0c7266eb98cd311a1c800c04f612452

The misidentification as "(Lepton 2.x)" happens by Magdir/measure
which gives too many false hits.

The examples are described by Magdir/ole2compounddocs as "UNKNOWN".
So only generic mime type application/x-ole-storage is shown and no
file name extension is displayed but luckily the used CLSID  is shown
in hexadecimal form.

The Online windows GUID converter on web site www.windowstricks.in
does not work any more in "modern" browsers. So i used another GUID
Converter on toolslick.com. So i replaced the relevant comment line
by a new one like:
#	https://toolslick.com/conversion/data/guid

With CLSID in GUID form i search the net for information and on web
site like file-extension.net for extension. So i come to conclusion
that examples like Sammlung.wsb and wsbsamp.wsb are Microsoft Works
portfolio files. The first i found later in directory MSWorks/Common
on Works 2003 CD. That information is fixed by comment lines like:
# URL:	https://en.wikipedia.org/wiki/Microsoft_Works
# Ref.:	fileformats.archiveteam.org/wiki/Microsoft_Compound_File
Portfolio is apparently the part of the Microsoft Works suite that
organize pictures like JPEG images. This becomes visible that the
image content is stored in streams with UTF-16 LE name __cf1 and the
corresponding file name like 001.JPG, 002.JPG. etc. is stored in
streams with name __fname. I do not know which version are described
by mentioned CLSID; probably version 6 and/or 7. So i insert after
Microsoft Works 5-6 document section inside Magdir/ole2compounddocs
the describing magic lines like:

  >>88 	ubequad		0xa1c800c04f612452	: Microsoft
  >>>80 	ubequad		0xc0c7266eb98cd311	Works portfolio
!:mime	application/vnd.ms-works
!:ext	wsb
The above mime type is registered ad IANA, but there nearly zero
information is found. And when searching the net often only 4
extensions wps, wcm, wdb and wks are mentioned, but i believe that
this also suitable for WSB examples. If this not true then the
generic mime type application/x-ole-storage should be used.

The example ole6.wpd with WPD file name extension is found in test
directory of Wordperfect to latex converter "wp2latex" sources. This
is expressed by line like:
# https://fossies.org/linux/wp2latex/test/ole6.wpd

Information about Wordperfect WPD format can be found for example at
file formats archive team web site. The mentioned information is not
up to date. For the inspected example i found on github a site with a
document WPFF_DocumentStructure.htm about WordPerfect File Format.
That information is now expressed by lines like:
# URL:	http://fileformats.archiveteam.org/wiki/WordPerfect
# Ref.:	https://github.com/OneWingedShark/WordPerfect/blob/master/
#	doc/SDK_Help/FileFormats/WPFF_DocumentStructure.htm

The middle aged Wordperfect documents ( version 5 and 6 ) are
characterised by start pattern \xffWPC. Such examples are described
by Magdir/wordprocessors. According to reference since version "7"
(WP7) this format is embedded as PerfectOffice_MAIN stream inside
Microsoft OLE Compound File. This can be verified by extracting that
stream via Michal Mutl MiTeC Structured Storage Viewer for example.
So for ole6.wpd i got ole6-PerfectOffice_MAIN.wpd. According to
reference characteristic is the from 1 to 2 raised minor version. So
for example the complete version is now 2.2.

So i put describing lines before WordPerfect 7-X3 presentation
section inside Magdir/ole2compounddocs like:

  >>88 	ubequad		0x19370000929679cd	: WordPerfect 7
  >>>80 	ubequad		0xff739851ad2d2002	Document
!:mime	application/vnd.wordperfect
!:ext	wpd

I tried to inspect embedded WordPerfect document inside
Magdir/wordprocessors but the indirect method does not work
correctly. Maybe this is a BUG in the file program. So i keep my
efforts as comment lines like:
#>>>>0	search/0xc01/s		\xffWPC		\b, WPC SIGNATURE
#>>>>>&0	indirect	x		\b; contains

After applying the above mentioned modifications by patch
file-5.40-ole2compounddocs-wsb_wpd.diff my examples are now described
correctly with -e cdf option like:
ole6-PerfectOffice_MAIN.wpd: WordPerfect document, v2.2
ole6.wpd:                    OLE 2 Compound Document, v3.62,
			     SecID 0x2, 2 FAT sectors,
			     0 Mini FAT sector :
			     WordPerfect 7 Document
Sammlung.wsb:                OLE 2 Compound Document, v3.62,
			     SecID 0x1, 13 FAT sectors,
			     Mini FAT start sector 0x2 :
			     Microsoft Works portfolio
wsbsamp.wsb:                 OLE 2 Compound Document, v3.62,
			     SecID 0x1, 9 FAT sectors,
			     Mini FAT start sector 0x2 :
			     Microsoft Works portfolio

And also the output with -i option now looks better like:

ole6-PerfectOffice_MAIN.wpd: application/octet-stream
ole6.wpd:                    application/vnd.wordperfect
Sammlung.wsb:                application/vnd.ms-works
wsbsamp.wsb:                 application/vnd.ms-works

The same applies for output with --extension like:
ole6-PerfectOffice_MAIN.wpd: ???
ole6.wpd:                    wpd
Sammlung.wsb:                wsb
wsbsamp.wsb:                 wsb

I hope that my diff file can be applied in future version of file
utility.

With best wishes
Jörg Jenderek
--
Jörg Jenderek




-------------- next part --------------
--- file-5.40/magic/Magdir/ole2compounddocs.old	2021-02-22 23:51:10 +0000
+++ file-5.40/magic/Magdir/ole2compounddocs	2021-08-31 19:08:53 +0000
@@ -84,13 +84,13 @@
 #>640	lestring16	x \b, 6th %.10s
 # 7th
 #>768	lestring16	x \b, 7th %.10s
 #	https://wikileaks.org/ciav7p1/cms/page_13762814.html
 #	https://m.blog.naver.com/superman4u/40047693679
 #	https://misc.daniel-marschall.de/projects/guid_analysis/guid.txt
-#	http://www.windowstricks.in/online-windows-guid-converter
+#	https://toolslick.com/conversion/data/guid
 #>80 	ubequad		!0			\b, clsid 0x%16.16llx
 #>>88 	ubequad		x			\b%16.16llx
 # test for "Root Entry" inside directory by type 5 value
 >66 	ubyte		5
 # look for CLSID GUID 0
 >>88 	ubequad		0x0
@@ -377,12 +377,27 @@
 >>88 	ubequad		0xa40700c04fb932ba	: Microsoft
 # URL:	http://fileformats.archiveteam.org/wiki/Microsoft_Works_Word_Processor
 >>>80 	ubequad		0xb25aa40e0a9ed111	Works 5-6 document
 !:mime	application/vnd.ms-works
 !:apple	????AWWP
 !:ext	wps
+# From:		Joerg Jenderek
+# URL:		https://en.wikipedia.org/wiki/Microsoft_Works
+# Reference:	http://fileformats.archiveteam.org/wiki/Microsoft_Compound_File
+# Note:		probably version 6 and 7
+# organize pictures like JPFG images in streams __cf1 with names like
+# 001.JPG, 002.JPG ... in streams __fname
+>>88 	ubequad		0xa1c800c04f612452	: Microsoft
+>>>80 	ubequad		0xc0c7266eb98cd311	Works portfolio
+# 2nd directory entry name PfOrder, 3rd __LastID and 4th __SizeUsed
+#!:mime	application/x-ole-storage
+# https://www.iana.org/assignments/media-types/application/vnd.ms-works
+!:mime	application/vnd.ms-works
+# https://extension.nirsoft.net/wsb
+# like: wsbsamp.wsb WORKS2003_CD:\MSWorks\Common\Sammlung.wsb
+!:ext	wsb
 #??
 # URL:	http://fileformats.archiveteam.org/wiki/Microsoft_Publisher
 >>88 	ubequad		0x00c0000000000046	: Microsoft
 >>>80 	ubequad		0x0112020000000000	Publisher
 !:mime	application/vnd.ms-publisher
 !:ext	pub
@@ -407,12 +422,31 @@
 #??
 >>88 	ubequad		0xbe1100c04fb6faf1	: Microsoft
 >>>80 	ubequad		0x3a8fb774c8c8d111	Project
 !:mime	application/vnd.ms-project
 !:ext	mpp
 #
+# URL:		http://fileformats.archiveteam.org/wiki/WordPerfect
+# Reference:	http://fileformats.archiveteam.org/wiki/Microsoft_Compound_File
+#		https://github.com/OneWingedShark/WordPerfect/
+#		blob/master/doc/SDK_Help/FileFormats/WPFF_DocumentStructure.htm
+# From:		Joerg Jenderek
+# Note:		internal version x.2 or 2.2 like in embedded ole6-PerfectOffice_MAIN.wpd
+# 3rd directory entry name PerfectOffice_OBJECT and 2nd PerfectOffice_MAIN,
+# which contains WordPerfect document \xffWPC signature handled by ./wordprocessors
+>>88 	ubequad		0x19370000929679cd	: WordPerfect 7
+>>>80 	ubequad		0xff739851ad2d2002	Document
+!:mime	application/vnd.wordperfect
+#!:apple	????WPC?
+# https://fossies.org/linux/wp2latex/test/ole6.wpd
+!:ext	wpd
+#>>>>0	search/0xc01/s	\xffWPC			\b, WPC SIGNATURE
+# inspect embedded WordPerfect document by ./wordprocessors with 1 space at end
+#>>>>>&0	indirect	x	\b; contains 
+# GRR: the above expression does not work correctly 
+#
 # URL:	http://fileformats.archiveteam.org/wiki/SHW_(Corel)
 #???
 >>88 	ubequad		0x99ae04021c007002	: WordPerfect
 >>>80 	ubequad		0x62fe2e4099191b10	7-X3 presentation
 !:mime	application/x-corelpresentations
 #!:mime	application/x-shw-viewer
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.40-ole2compounddocs-wsb_wpd.diff.sig
Type: application/octet-stream
Size: 1683 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20210901/2574f0d9/attachment.obj>


More information about the File mailing list