[File] [PATCH] Magdir/ole2compounddocs Greenstreet ART drawing misidentfied as Microsoft

Christos Zoulas christos at zoulas.com
Wed Oct 19 20:16:50 UTC 2022


Committed, thanks!

christos

> On Oct 19, 2022, at 4:07 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hello,
> 
> some days ago i handled some files with ART file name extensions.
> Some samples are "newer" Greenstreet Art drawings.
> 
> When running file command version 5.41 with -e cdf option on such
> examples and related files i get an output like:
> 
> BCARD2.ART:          OLE 2 Compound Document, v3.62, SecID 0x2,
> 		     2 FAT sectors,
> 		     Mini FAT start sector 0x4 : Microsoft
> CONTENTS-stream.art: data
> POSTER2.ART:         OLE 2 Compound Document, v3.62, SecID 0x2,
> 		     3 FAT sectors,
> 		     Mini FAT start sector 0x4 : Microsoft
> Preview-stream.dib:  Device independent bitmap graphic, 100 x 65 x 8,
> 		     image size 6500, resolution 1830 x 1830 px/m
> SLIDE2.ART:          OLE 2 Compound Document, v3.62, SecID 0x2,
> 		     2 FAT sectors,
> 		     Mini FAT start sector 0x4 : Microsoft
> Visio2002Test.vsd:   OLE 2 Compound Document, v3.62, SecID 0x2,
> 		     Mini FAT start sector 0x5 : Microsoft
> 		     Visio 2000-2002 Document, stencil or template
> 
> Furthermore only generic mime type application/octet-stream is
> shown with -i and -e cdf option. With option --extension only 3 byte
> sequence ??? is shown.
> 
> When running file command with -e soft or no extra option for my
> examples i get a output like:
> 
> BCARD2.ART:          Composite Document File V2 Document,
> 		     Little Endian, Os 0, Version: 3.95, Title:
> 		     greenstreet Draw Art
> 		     FileComposite Document File V2 Document,
> 		     Cannot read section info
> CONTENTS-stream.art: data
> POSTER2.ART:         Composite Document File V2 Document,
> 		     Little Endian, Os 0, Version: 3.95, Title:
> 		     greenstreet Draw Art
> 		     FileComposite Document File V2 Document,
> 		     Cannot read section info
> Preview-stream.dib:  Device independent bitmap graphic, 100 x 65 x 8,
> 		     image size 6500, resolution 1830 x 1830 px/m
> SLIDE2.ART:          Composite Document File V2 Document,
> 		     Little Endian, Os 0, Version: 3.95, Title:
> 		     greenstreet Draw Art
> 		     FileComposite Document File V2 Document,
> 		     Cannot read section info
> Visio2002Test.vsd:   Composite Document File V2 Document,
> 		     Little Endian, Os: Windows, Version 6.1,
> 		     Code page: 1252, Author: dtobias,
> 		     Name of Creating Application: Microsoft Visio
> 
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). This identifies
> "older" examples like CONTENTS-stream.art as "Greenstreet Art drawing
> (old)" by art-gst.trid.xml. The "newer" ART examples with low
> priority are described as "Generic OLE2 / Multistream
> Compound" by docfile.trid.xml. These samples are also described with
> higher rate as "Greenstreet Art drawing" by art-gst-docfile.trid.xml
> (See appended trid-v-art.txt.gz).
> 
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/). This
> does describe the newer variants generic as "OLE2 Compound Document
> Format" by PUID fmt/111.
> 
> Luckily with information given by TrID i get a page about GST ART
> on file formats archive team web site. There it is written that
> "newer" version is based on Microsoft Compound File format. That
> informations are expressed now inside Magdir/ole2compounddocs
> by comment lines like:
> # URL:		http://fileformats.archiveteam.org/wiki/GST_ART
> # Reference:	http://mark0.net/download/triddefs_xml.7z
> #		defs/a/art-gst-docfile.trid.xml
> 
> There under item "Software & Samples" the Greenstreet Publishing
> Suite 99 is mentioned. On that CD-ROM image i found such "new"
> variant samples.
> 
> The ART examples are recognized as "OLE 2 Compound Document"
> by starting bytes (\320\317\021\340\241\261\032\341) at the beginning
> inside Magdir/ole2compounddocs.
> 
> If sub classification is not possible then it normally prints at the
> end after the values section a part starting with phrase ": UNKNOWN".
> There at the moment exist 2 unknown branches. On for examples with
> non null clsid. That part looked like:
>>> 88 	default		x
>>>> 88 	ubequad		!0			: UNKNOWN
> !:mime	application/x-ole-storage
>>>>> 80 	ubequad		!0		\b, clsid %#16.16llx
>>>>> 88 	ubequad		x		\b%16.16llx
> 
> I put this displaying part about CDF directory inside sub routine
> that start like:
> 
> 0	name			ole2-unknown
>> 80 	ubequad		x			: UNKNOWN
> !:mime	application/x-ole-storage
>> 80 	ubequad		!0			\b, clsid %#16.16llx
>>> 88 ubequad		x			\b%16.16llx
> 
> To help "common" user not knowing how to convert GUID i also show the
> clsid after in hexadecimal form also in "curly braces" form.
> Furthermore i also show second til seventh directory names ( often
> useful when inspecting unknown samples) after first entry "Root
> Entry" by additional lines like:
>>> 80	guid		x			{%s}
>> 128	lestring16	x with names %.20s
>> 256	lestring16	x %.20s
>> 384	lestring16	x %.25s
>> 512	lestring16	x %.10s
>> 640	lestring16	x %.10s
>> 768	lestring16	x %.10s
> 
> So i can use this sub routine also in second "unknown" branch where i
> replace displaying part by calling sub routine which now becomes like
> :
>>>>> 128 	default		x
>>>>>> 0 	use		ole2-unknown
> instead of lines like:
>>>>> 128 	default		x			: UNKNOWN
>>>>>> 128	lestring16	x with names %.20s
>>>>>> 256	lestring16	x %.20s
>>>>>> 384	lestring16	x %.20s
> 
> At this point we have nothing win, but obviously there exist a third
> "unknown" branch. Many Microsoft has similar clsid (Or precisely the
> second part of GUID is the same). So such Microsoft product from
> "Visio 2000-2002" til "PowerPoint 4.0" are handled in a branch that
> looks like:
>>> 88 	ubequad		0xc000000000000046	: Microsoft
>>>> 80 	ubequad		0x131a020000000000	Visio 2000-2002
> !:ext	vsd/vss/vst
> ...
>>>> 80 	ubequad		0x5148040000000000	PowerPoint 4.0
> !:ext	ppt
> Unfortunately this shared part of clsid is also true for "newer"
> Greenstreet ART drawings. So i move phrase ": Microsoft" to second
> test lines and add at the end a default clause to match unrecognized
> examples by third "unknown" branch. So this now becomes like:
>>> 88 	ubequad	0xc000000000000046
>>>> 80 	ubequad	0x131a020000000000	: Microsoft Visio 2000-2002
> !:ext	vsd/vss/vst
> ...
>>>> 80 	ubequad	0x5148040000000000	: Microsoft PowerPoint 4.0
> !:ext	ppt
>>>> 80 	default		x
>>>>> 0 	use		ole2-unknown
> 
> When now running file command on "newer" ART samples i get output lik
> e:
> 
> BCARD2.ART:  OLE 2 Compound Document, v3.62, SecID 0x2,
> 	     2 FAT sectors, Mini FAT start sector 0x4 : UNKNOWN,
> 	     clsid 0x602c020000000000c000000000000046
> 	     {00022C60-0000-0000-C000-000000000046}
> 	     with names CONTENTS Preview.dib
> 	     \005SummaryInformation \377\377\3 ! A
> POSTER2.ART: OLE 2 Compound Document, v3.62, SecID 0x2,
> 	     3 FAT sectors, Mini FAT start sector 0x4 : UNKNOWN,
> 	     clsid 0x602c020000000000c000000000000046
> 	     {00022C60-0000-0000-C000-000000000046}
> 	     with names CONTENTS Preview.dib
> 	     \005SummaryInformation \377\377\3 ! A
> SLIDE2.ART:  OLE 2 Compound Document, v3.62, SecID 0x2,
> 	     2 FAT sectors, Mini FAT start sector 0x4 : UNKNOWN,
> 	     clsid 0x602c020000000000c000000000000046
> 	     {00022C60-0000-0000-C000-000000000046}
> 	     with names CONTENTS Preview.dib
> 	     \005SummaryInformation \377\377\3 ! A
> 
> The information mentioned for "older" also applies to "newer"
> variant. On Web page it is written that the "older" part now is
> stored as content stream. That name is also shown by above patched
> file command. Because the ART samples are OLE2 Compound container we
> can inspect such examples by suited tools like Michal Mutl Structured
> Storage Viewer for example. There we see that stream and we can save
> it as CONTENTS-stream.art for example. This is described by TrID as
> "old" variant.
> 
> So to catch the ART samples i now only insert suited lines before
> default clause. These additional lines look like:
>>>> 80 	ubequad	0x602c020000000000	: Greenstreet Art drawing
> !:mime	image/x-greenstreet-art
> !:ext	art
> Furthermore i display an user defined mime type instead of generic
> application/x-ole-storage.
> 
> After applying the above mentioned modifications by patch
> file-5.41-ole2compounddocs-art.diff then all my "newer" inspected
> Greenstreet ART drawing examples are now described with more details.
> This now looks with -e cdf option like:
> BCARD2.ART:          OLE 2 Compound Document, v3.62, SecID 0x2,
> 		     2 FAT sectors, Mini FAT start sector 0x4 :
> 		     Greenstreet Art drawing
> CONTENTS-stream.art: data
> POSTER2.ART:         OLE 2 Compound Document, v3.62, SecID 0x2,
> 		     3 FAT sectors, Mini FAT start sector 0x4 :
> 		     Greenstreet Art drawing
> Preview-stream.dib:  Device independent bitmap graphic, 100 x 65 x 8,
> 		     image size 6500, resolution 1830 x 1830 px/m
> SLIDE2.ART:          OLE 2 Compound Document, v3.62, SecID 0x2,
> 		     2 FAT sectors, Mini FAT start sector 0x4 :
> 		     Greenstreet Art drawing
> Visio2002Test.vsd:   OLE 2 Compound Document, v3.62, SecID 0x2,
> 		     Mini FAT start sector 0x5 : Microsoft
> 		     Visio 2000-2002 Document, stencil or template
> 
> I hope my diff file can be applied in future version of file
> utility.
> 
> There exist some older ART examples where another format is used. I
> will try to handle this in a future session.
> 
> With best wishes,
> Jörg Jenderek
> - --
> Jörg Jenderek
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
> 
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCY1BZBAAKCRCv8rHJQhrU
> 1mxOAJ9lJVqAB+6fBkJY7HziTF7ac3nBUgCdG+5E+A9kYLehlTelom0MAHU3oWo=
> =tzLX
> -----END PGP SIGNATURE-----
> <trid-v-art.txt.gz><file-5_43-ole2compounddocs-art_diff.DEFANGED-0><file-5_43-ole2compounddocs-art_diff_sig.DEFANGED-1>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20221019/a0902b40/attachment-0001.asc>


More information about the File mailing list