[File] [PATCH] Magdir/ole2compounddocs Greenstreet ART drawing misidentfied as Microsoft

Jörg Jenderek joerg.jen.der.ek at gmx.net
Wed Oct 19 20:07:33 UTC 2022


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

some days ago i handled some files with ART file name extensions.
Some samples are "newer" Greenstreet Art drawings.

When running file command version 5.41 with -e cdf option on such
examples and related files i get an output like:

BCARD2.ART:          OLE 2 Compound Document, v3.62, SecID 0x2,
		     2 FAT sectors,
		     Mini FAT start sector 0x4 : Microsoft
CONTENTS-stream.art: data
POSTER2.ART:         OLE 2 Compound Document, v3.62, SecID 0x2,
		     3 FAT sectors,
		     Mini FAT start sector 0x4 : Microsoft
Preview-stream.dib:  Device independent bitmap graphic, 100 x 65 x 8,
		     image size 6500, resolution 1830 x 1830 px/m
SLIDE2.ART:          OLE 2 Compound Document, v3.62, SecID 0x2,
		     2 FAT sectors,
		     Mini FAT start sector 0x4 : Microsoft
Visio2002Test.vsd:   OLE 2 Compound Document, v3.62, SecID 0x2,
		     Mini FAT start sector 0x5 : Microsoft
		     Visio 2000-2002 Document, stencil or template

Furthermore only generic mime type application/octet-stream is
shown with -i and -e cdf option. With option --extension only 3 byte
sequence ??? is shown.

When running file command with -e soft or no extra option for my
examples i get a output like:

BCARD2.ART:          Composite Document File V2 Document,
		     Little Endian, Os 0, Version: 3.95, Title:
		     greenstreet Draw Art
		     FileComposite Document File V2 Document,
		     Cannot read section info
CONTENTS-stream.art: data
POSTER2.ART:         Composite Document File V2 Document,
		     Little Endian, Os 0, Version: 3.95, Title:
		     greenstreet Draw Art
		     FileComposite Document File V2 Document,
		     Cannot read section info
Preview-stream.dib:  Device independent bitmap graphic, 100 x 65 x 8,
		     image size 6500, resolution 1830 x 1830 px/m
SLIDE2.ART:          Composite Document File V2 Document,
		     Little Endian, Os 0, Version: 3.95, Title:
		     greenstreet Draw Art
		     FileComposite Document File V2 Document,
		     Cannot read section info
Visio2002Test.vsd:   Composite Document File V2 Document,
		     Little Endian, Os: Windows, Version 6.1,
		     Code page: 1252, Author: dtobias,
		     Name of Creating Application: Microsoft Visio

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). This identifies
"older" examples like CONTENTS-stream.art as "Greenstreet Art drawing
(old)" by art-gst.trid.xml. The "newer" ART examples with low
priority are described as "Generic OLE2 / Multistream
Compound" by docfile.trid.xml. These samples are also described with
higher rate as "Greenstreet Art drawing" by art-gst-docfile.trid.xml
(See appended trid-v-art.txt.gz).

For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/). This
does describe the newer variants generic as "OLE2 Compound Document
Format" by PUID fmt/111.

Luckily with information given by TrID i get a page about GST ART
on file formats archive team web site. There it is written that
"newer" version is based on Microsoft Compound File format. That
informations are expressed now inside Magdir/ole2compounddocs
by comment lines like:
# URL:		http://fileformats.archiveteam.org/wiki/GST_ART
# Reference:	http://mark0.net/download/triddefs_xml.7z
#		defs/a/art-gst-docfile.trid.xml

There under item "Software & Samples" the Greenstreet Publishing
Suite 99 is mentioned. On that CD-ROM image i found such "new"
variant samples.

The ART examples are recognized as "OLE 2 Compound Document"
by starting bytes (\320\317\021\340\241\261\032\341) at the beginning
inside Magdir/ole2compounddocs.

If sub classification is not possible then it normally prints at the
end after the values section a part starting with phrase ": UNKNOWN".
There at the moment exist 2 unknown branches. On for examples with
non null clsid. That part looked like:
 >>88 	default		x
 >>>88 	ubequad		!0			: UNKNOWN
 !:mime	application/x-ole-storage
 >>>>80 	ubequad		!0		\b, clsid %#16.16llx
 >>>>88 	ubequad		x		\b%16.16llx

I put this displaying part about CDF directory inside sub routine
that start like:

0	name			ole2-unknown
> 80 	ubequad		x			: UNKNOWN
!:mime	application/x-ole-storage
> 80 	ubequad		!0			\b, clsid %#16.16llx
>> 88 ubequad		x			\b%16.16llx

To help "common" user not knowing how to convert GUID i also show the
clsid after in hexadecimal form also in "curly braces" form.
Furthermore i also show second til seventh directory names ( often
useful when inspecting unknown samples) after first entry "Root
Entry" by additional lines like:
 >>80	guid		x			{%s}
 >128	lestring16	x with names %.20s
 >256	lestring16	x %.20s
 >384	lestring16	x %.25s
 >512	lestring16	x %.10s
 >640	lestring16	x %.10s
 >768	lestring16	x %.10s

So i can use this sub routine also in second "unknown" branch where i
replace displaying part by calling sub routine which now becomes like
:
 >>>>128 	default		x
 >>>>>0 	use		ole2-unknown
instead of lines like:
 >>>>128 	default		x			: UNKNOWN
 >>>>>128	lestring16	x with names %.20s
 >>>>>256	lestring16	x %.20s
 >>>>>384	lestring16	x %.20s

At this point we have nothing win, but obviously there exist a third
"unknown" branch. Many Microsoft has similar clsid (Or precisely the
second part of GUID is the same). So such Microsoft product from
"Visio 2000-2002" til "PowerPoint 4.0" are handled in a branch that
looks like:
 >>88 	ubequad		0xc000000000000046	: Microsoft
 >>>80 	ubequad		0x131a020000000000	Visio 2000-2002
 !:ext	vsd/vss/vst
 ...
 >>>80 	ubequad		0x5148040000000000	PowerPoint 4.0
 !:ext	ppt
Unfortunately this shared part of clsid is also true for "newer"
Greenstreet ART drawings. So i move phrase ": Microsoft" to second
test lines and add at the end a default clause to match unrecognized
examples by third "unknown" branch. So this now becomes like:
 >>88 	ubequad	0xc000000000000046
 >>>80 	ubequad	0x131a020000000000	: Microsoft Visio 2000-2002
 !:ext	vsd/vss/vst
 ...
 >>>80 	ubequad	0x5148040000000000	: Microsoft PowerPoint 4.0
 !:ext	ppt
 >>>80 	default		x
 >>>>0 	use		ole2-unknown

When now running file command on "newer" ART samples i get output lik
e:

BCARD2.ART:  OLE 2 Compound Document, v3.62, SecID 0x2,
	     2 FAT sectors, Mini FAT start sector 0x4 : UNKNOWN,
	     clsid 0x602c020000000000c000000000000046
	     {00022C60-0000-0000-C000-000000000046}
	     with names CONTENTS Preview.dib
	     \005SummaryInformation \377\377\3 ! A
POSTER2.ART: OLE 2 Compound Document, v3.62, SecID 0x2,
	     3 FAT sectors, Mini FAT start sector 0x4 : UNKNOWN,
	     clsid 0x602c020000000000c000000000000046
	     {00022C60-0000-0000-C000-000000000046}
	     with names CONTENTS Preview.dib
	     \005SummaryInformation \377\377\3 ! A
SLIDE2.ART:  OLE 2 Compound Document, v3.62, SecID 0x2,
	     2 FAT sectors, Mini FAT start sector 0x4 : UNKNOWN,
	     clsid 0x602c020000000000c000000000000046
	     {00022C60-0000-0000-C000-000000000046}
	     with names CONTENTS Preview.dib
	     \005SummaryInformation \377\377\3 ! A

The information mentioned for "older" also applies to "newer"
variant. On Web page it is written that the "older" part now is
stored as content stream. That name is also shown by above patched
file command. Because the ART samples are OLE2 Compound container we
can inspect such examples by suited tools like Michal Mutl Structured
Storage Viewer for example. There we see that stream and we can save
it as CONTENTS-stream.art for example. This is described by TrID as
"old" variant.

So to catch the ART samples i now only insert suited lines before
default clause. These additional lines look like:
 >>>80 	ubequad	0x602c020000000000	: Greenstreet Art drawing
 !:mime	image/x-greenstreet-art
 !:ext	art
Furthermore i display an user defined mime type instead of generic
application/x-ole-storage.

After applying the above mentioned modifications by patch
file-5.41-ole2compounddocs-art.diff then all my "newer" inspected
Greenstreet ART drawing examples are now described with more details.
This now looks with -e cdf option like:
BCARD2.ART:          OLE 2 Compound Document, v3.62, SecID 0x2,
		     2 FAT sectors, Mini FAT start sector 0x4 :
		     Greenstreet Art drawing
CONTENTS-stream.art: data
POSTER2.ART:         OLE 2 Compound Document, v3.62, SecID 0x2,
		     3 FAT sectors, Mini FAT start sector 0x4 :
		     Greenstreet Art drawing
Preview-stream.dib:  Device independent bitmap graphic, 100 x 65 x 8,
		     image size 6500, resolution 1830 x 1830 px/m
SLIDE2.ART:          OLE 2 Compound Document, v3.62, SecID 0x2,
		     2 FAT sectors, Mini FAT start sector 0x4 :
		     Greenstreet Art drawing
Visio2002Test.vsd:   OLE 2 Compound Document, v3.62, SecID 0x2,
		     Mini FAT start sector 0x5 : Microsoft
		     Visio 2000-2002 Document, stencil or template

I hope my diff file can be applied in future version of file
utility.

There exist some older ART examples where another format is used. I
will try to handle this in a future session.

With best wishes,
Jörg Jenderek
- --
Jörg Jenderek
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCY1BZBAAKCRCv8rHJQhrU
1mxOAJ9lJVqAB+6fBkJY7HziTF7ac3nBUgCdG+5E+A9kYLehlTelom0MAHU3oWo=
=tzLX
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-art.txt.gz
Type: application/x-gzip
Size: 780 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20221019/f0bc3034/attachment.bin>
-------------- next part --------------
--- file-5.43/magic/Magdir/ole2compounddocs.old	2022-09-13 20:05:40.000000000 +0200
+++ file-5.43/magic/Magdir/ole2compounddocs	2022-10-19 21:41:23.239179400 +0200
@@ -74,2 +74,3 @@
 #>68 	bequad		!0xffffffffffffffff	\b, DirIDs %llx
+# NEXT lines for DEBUGGING
 # second directory entry name like VisioDocument Control000 
@@ -307,13 +308,4 @@
 #	remaining null clsid
->>>>128 	default		x			: UNKNOWN
-# second directory entry name like VisioDocument Control000
->>>>>128	lestring16	x with names %.20s
-# third directory entry like WordDocument
->>>>>256	lestring16	x %.20s
-# forth
->>>>>384	lestring16	x %.20s
-!:mime	application/x-ole-storage
-# according to file version 5.41 with -e soft option
-#!:mime	application/CDFV2
-#!:ext	???
+>>>>128 	default		x
+>>>>>0 	use		ole2-unknown
 #	look for known clsid GUID
@@ -322,4 +314,4 @@
 #   Last update on 10/23/2006 by Lester Hightower, 07/20/2019 by Joerg Jenderek
->>88 	ubequad		0xc000000000000046	: Microsoft
->>>80 	ubequad		0x131a020000000000	Visio 2000-2002 Document, stencil or template
+>>88 	ubequad		0xc000000000000046
+>>>80 	ubequad		0x131a020000000000	: Microsoft Visio 2000-2002 Document, stencil or template
 !:mime	application/vnd.visio
@@ -327,3 +319,3 @@
 !:ext	vsd/vss/vst
->>>80 	ubequad		0x141a020000000000	Visio 2003-2010 Document, stencil or template
+>>>80 	ubequad		0x141a020000000000	: Microsoft Visio 2003-2010 Document, stencil or template
 !:mime	application/vnd.visio
@@ -332,3 +324,3 @@
 # URL:	http://fileformats.archiveteam.org/wiki/Windows_Installer
->>>80 	ubequad		0x84100c0000000000	Windows Installer Package
+>>>80 	ubequad		0x84100c0000000000	: Microsoft Windows Installer Package
 !:mime	application/x-msi
@@ -336,3 +328,3 @@
 !:ext	msi
->>>80 	ubequad		0x86100c0000000000	Windows Installer Patch
+>>>80 	ubequad		0x86100c0000000000	: Microsoft Windows Installer Patch
 # ??
@@ -343,3 +335,3 @@
 # URL:	http://fileformats.archiveteam.org/wiki/DOC
->>>80 	ubequad		0x0009020000000000	Word 6-95 document or template
+>>>80 	ubequad		0x0009020000000000	: Microsoft Word 6-95 document or template
 !:mime	application/msword
@@ -348,3 +340,3 @@
 !:ext	doc/dot
->>>80 	ubequad		0x0609020000000000	Word 97-2003 document or template
+>>>80 	ubequad		0x0609020000000000	: Microsoft Word 97-2003 document or template
 !:mime	application/msword
@@ -355,3 +347,3 @@
 # URL:	http://fileformats.archiveteam.org/wiki/Microsoft_Works_Word_Processor
->>>80 	ubequad		0x0213020000000000	Works 3-4 document or template
+>>>80 	ubequad		0x0213020000000000	: Microsoft Works 3-4 document or template
 !:mime	application/vnd.ms-works
@@ -362,3 +354,3 @@
 # URL:	http://fileformats.archiveteam.org/wiki/Microsoft_Works_Database
->>>80 	ubequad		0x0313020000000000	Works 3-4 database or template
+>>>80 	ubequad		0x0313020000000000	: Microsoft Works 3-4 database or template
 !:mime	application/vnd.ms-works-db
@@ -370,3 +362,3 @@
 # URL:	https://en.wikipedia.org/wiki/Microsoft_Excel
->>>80 	ubequad		0x1008020000000000	Excel 5-95 worksheet, addin or template
+>>>80 	ubequad		0x1008020000000000	: Microsoft Excel 5-95 worksheet, addin or template
 !:mime	application/vnd.ms-excel
@@ -377,3 +369,3 @@
 #
->>>80 	ubequad		0x2008020000000000	Excel 97-2003
+>>>80 	ubequad		0x2008020000000000	: Microsoft Excel 97-2003
 !:mime	application/vnd.ms-excel
@@ -393,4 +385,4 @@
 # URL:	http://fileformats.archiveteam.org/wiki/OLE2
->>>80 	ubequad		0x0b0d020000000000	Outlook 97-2003 item
-#>>>80 	ubequad		0x0b0d020000000000	Outlook 97-2003 Message
+>>>80 	ubequad		0x0b0d020000000000	: Microsoft Outlook 97-2003 item
+#>>>80 	ubequad		0x0b0d020000000000	: Microsoft Outlook 97-2003 Message
 #!:mime	application/vnd.ms-outlook
@@ -399,3 +391,3 @@
 # URL:	https://wiki.fileformat.com/email/oft/
->>>80 	ubequad		0x46f0060000000000	Outlook 97-2003 item template
+>>>80 	ubequad		0x46f0060000000000	: Microsoft Outlook 97-2003 item template
 #!:mime	application/vnd.ms-outlook
@@ -405,3 +397,3 @@
 # URL:	http://fileformats.archiveteam.org/wiki/PPT
->>>80 	ubequad		0x5148040000000000	PowerPoint 4.0 presentation
+>>>80 	ubequad		0x5148040000000000	: Microsoft PowerPoint 4.0 presentation
 !:mime	application/vnd.ms-powerpoint
@@ -410,2 +402,15 @@
 !:ext	ppt
+# Summary:	"newer" Greenstreet Art drawing
+# From:		Joerg Jenderek
+# URL:		http://fileformats.archiveteam.org/wiki/GST_ART
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/a/art-gst-docfile.trid.xml
+# Note:		called like "Greenstreet Art drawing" by TrID
+# Note:		CONTENT stream contains binary part of older versions with phrase GST:ART at offset 16
+#		verified by Michal Mutl Structured Storage Viewer `SSView.exe BCARD2.ART`
+>>>80 	ubequad		0x602c020000000000	: Greenstreet Art drawing
+#!:mime	application/x-ole-storage
+!:mime	image/x-greenstreet-art
+!:ext	art
+>>>80 	default		x
+>>>>0 	use		ole2-unknown
 #??
@@ -663,4 +668,6 @@
 >>88 	default		x
-# GRR: check again for non null clsid because wrong when called by indirect directive
->>>88 	ubequad		!0			: UNKNOWN
+>>>0 	use		ole2-unknown
+# display information about directory for not detected CDF files
+0	name			ole2-unknown
+>80 	ubequad		x			: UNKNOWN
 # https://reposcope.com/mimetype/application/x-ole-storage
@@ -670,4 +677,17 @@
 #!:ext	???
->>>>80 	ubequad		!0			\b, clsid %#16.16llx
->>>>88 	ubequad		x			\b%16.16llx
-
+>80 	ubequad		!0			\b, clsid %#16.16llx
+>>88 ubequad		x			\b%16.16llx
+# converted hexadecimal format to standard GUUID notation
+>>80	guid		x			{%s}
+# second directory entry name like VisioDocument Control000
+>128	lestring16	x with names %.20s
+# third directory entry like WordDocument Preview.dib
+>256	lestring16	x %.20s
+# forth like \005SummaryInformation
+>384	lestring16	x %.25s
+# 5th
+>512	lestring16	x %.10s
+# 6th
+>640	lestring16	x %.10s
+# 7th
+>768	lestring16	x %.10s
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.43-ole2compounddocs-art.diff.sig
Type: application/octet-stream
Size: 2010 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20221019/f0bc3034/attachment.obj>


More information about the File mailing list