[File] [PATCH] Magdir/ole2compounddocs Greenstreet ART drawing misidentfied as Microsoft
Christos Zoulas
christos at zoulas.com
Wed Oct 19 20:16:50 UTC 2022
Committed, thanks!
christos
> On Oct 19, 2022, at 4:07 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello,
>
> some days ago i handled some files with ART file name extensions.
> Some samples are "newer" Greenstreet Art drawings.
>
> When running file command version 5.41 with -e cdf option on such
> examples and related files i get an output like:
>
> BCARD2.ART: OLE 2 Compound Document, v3.62, SecID 0x2,
> 2 FAT sectors,
> Mini FAT start sector 0x4 : Microsoft
> CONTENTS-stream.art: data
> POSTER2.ART: OLE 2 Compound Document, v3.62, SecID 0x2,
> 3 FAT sectors,
> Mini FAT start sector 0x4 : Microsoft
> Preview-stream.dib: Device independent bitmap graphic, 100 x 65 x 8,
> image size 6500, resolution 1830 x 1830 px/m
> SLIDE2.ART: OLE 2 Compound Document, v3.62, SecID 0x2,
> 2 FAT sectors,
> Mini FAT start sector 0x4 : Microsoft
> Visio2002Test.vsd: OLE 2 Compound Document, v3.62, SecID 0x2,
> Mini FAT start sector 0x5 : Microsoft
> Visio 2000-2002 Document, stencil or template
>
> Furthermore only generic mime type application/octet-stream is
> shown with -i and -e cdf option. With option --extension only 3 byte
> sequence ??? is shown.
>
> When running file command with -e soft or no extra option for my
> examples i get a output like:
>
> BCARD2.ART: Composite Document File V2 Document,
> Little Endian, Os 0, Version: 3.95, Title:
> greenstreet Draw Art
> FileComposite Document File V2 Document,
> Cannot read section info
> CONTENTS-stream.art: data
> POSTER2.ART: Composite Document File V2 Document,
> Little Endian, Os 0, Version: 3.95, Title:
> greenstreet Draw Art
> FileComposite Document File V2 Document,
> Cannot read section info
> Preview-stream.dib: Device independent bitmap graphic, 100 x 65 x 8,
> image size 6500, resolution 1830 x 1830 px/m
> SLIDE2.ART: Composite Document File V2 Document,
> Little Endian, Os 0, Version: 3.95, Title:
> greenstreet Draw Art
> FileComposite Document File V2 Document,
> Cannot read section info
> Visio2002Test.vsd: Composite Document File V2 Document,
> Little Endian, Os: Windows, Version 6.1,
> Code page: 1252, Author: dtobias,
> Name of Creating Application: Microsoft Visio
>
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). This identifies
> "older" examples like CONTENTS-stream.art as "Greenstreet Art drawing
> (old)" by art-gst.trid.xml. The "newer" ART examples with low
> priority are described as "Generic OLE2 / Multistream
> Compound" by docfile.trid.xml. These samples are also described with
> higher rate as "Greenstreet Art drawing" by art-gst-docfile.trid.xml
> (See appended trid-v-art.txt.gz).
>
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/). This
> does describe the newer variants generic as "OLE2 Compound Document
> Format" by PUID fmt/111.
>
> Luckily with information given by TrID i get a page about GST ART
> on file formats archive team web site. There it is written that
> "newer" version is based on Microsoft Compound File format. That
> informations are expressed now inside Magdir/ole2compounddocs
> by comment lines like:
> # URL: http://fileformats.archiveteam.org/wiki/GST_ART
> # Reference: http://mark0.net/download/triddefs_xml.7z
> # defs/a/art-gst-docfile.trid.xml
>
> There under item "Software & Samples" the Greenstreet Publishing
> Suite 99 is mentioned. On that CD-ROM image i found such "new"
> variant samples.
>
> The ART examples are recognized as "OLE 2 Compound Document"
> by starting bytes (\320\317\021\340\241\261\032\341) at the beginning
> inside Magdir/ole2compounddocs.
>
> If sub classification is not possible then it normally prints at the
> end after the values section a part starting with phrase ": UNKNOWN".
> There at the moment exist 2 unknown branches. On for examples with
> non null clsid. That part looked like:
>>> 88 default x
>>>> 88 ubequad !0 : UNKNOWN
> !:mime application/x-ole-storage
>>>>> 80 ubequad !0 \b, clsid %#16.16llx
>>>>> 88 ubequad x \b%16.16llx
>
> I put this displaying part about CDF directory inside sub routine
> that start like:
>
> 0 name ole2-unknown
>> 80 ubequad x : UNKNOWN
> !:mime application/x-ole-storage
>> 80 ubequad !0 \b, clsid %#16.16llx
>>> 88 ubequad x \b%16.16llx
>
> To help "common" user not knowing how to convert GUID i also show the
> clsid after in hexadecimal form also in "curly braces" form.
> Furthermore i also show second til seventh directory names ( often
> useful when inspecting unknown samples) after first entry "Root
> Entry" by additional lines like:
>>> 80 guid x {%s}
>> 128 lestring16 x with names %.20s
>> 256 lestring16 x %.20s
>> 384 lestring16 x %.25s
>> 512 lestring16 x %.10s
>> 640 lestring16 x %.10s
>> 768 lestring16 x %.10s
>
> So i can use this sub routine also in second "unknown" branch where i
> replace displaying part by calling sub routine which now becomes like
> :
>>>>> 128 default x
>>>>>> 0 use ole2-unknown
> instead of lines like:
>>>>> 128 default x : UNKNOWN
>>>>>> 128 lestring16 x with names %.20s
>>>>>> 256 lestring16 x %.20s
>>>>>> 384 lestring16 x %.20s
>
> At this point we have nothing win, but obviously there exist a third
> "unknown" branch. Many Microsoft has similar clsid (Or precisely the
> second part of GUID is the same). So such Microsoft product from
> "Visio 2000-2002" til "PowerPoint 4.0" are handled in a branch that
> looks like:
>>> 88 ubequad 0xc000000000000046 : Microsoft
>>>> 80 ubequad 0x131a020000000000 Visio 2000-2002
> !:ext vsd/vss/vst
> ...
>>>> 80 ubequad 0x5148040000000000 PowerPoint 4.0
> !:ext ppt
> Unfortunately this shared part of clsid is also true for "newer"
> Greenstreet ART drawings. So i move phrase ": Microsoft" to second
> test lines and add at the end a default clause to match unrecognized
> examples by third "unknown" branch. So this now becomes like:
>>> 88 ubequad 0xc000000000000046
>>>> 80 ubequad 0x131a020000000000 : Microsoft Visio 2000-2002
> !:ext vsd/vss/vst
> ...
>>>> 80 ubequad 0x5148040000000000 : Microsoft PowerPoint 4.0
> !:ext ppt
>>>> 80 default x
>>>>> 0 use ole2-unknown
>
> When now running file command on "newer" ART samples i get output lik
> e:
>
> BCARD2.ART: OLE 2 Compound Document, v3.62, SecID 0x2,
> 2 FAT sectors, Mini FAT start sector 0x4 : UNKNOWN,
> clsid 0x602c020000000000c000000000000046
> {00022C60-0000-0000-C000-000000000046}
> with names CONTENTS Preview.dib
> \005SummaryInformation \377\377\3 ! A
> POSTER2.ART: OLE 2 Compound Document, v3.62, SecID 0x2,
> 3 FAT sectors, Mini FAT start sector 0x4 : UNKNOWN,
> clsid 0x602c020000000000c000000000000046
> {00022C60-0000-0000-C000-000000000046}
> with names CONTENTS Preview.dib
> \005SummaryInformation \377\377\3 ! A
> SLIDE2.ART: OLE 2 Compound Document, v3.62, SecID 0x2,
> 2 FAT sectors, Mini FAT start sector 0x4 : UNKNOWN,
> clsid 0x602c020000000000c000000000000046
> {00022C60-0000-0000-C000-000000000046}
> with names CONTENTS Preview.dib
> \005SummaryInformation \377\377\3 ! A
>
> The information mentioned for "older" also applies to "newer"
> variant. On Web page it is written that the "older" part now is
> stored as content stream. That name is also shown by above patched
> file command. Because the ART samples are OLE2 Compound container we
> can inspect such examples by suited tools like Michal Mutl Structured
> Storage Viewer for example. There we see that stream and we can save
> it as CONTENTS-stream.art for example. This is described by TrID as
> "old" variant.
>
> So to catch the ART samples i now only insert suited lines before
> default clause. These additional lines look like:
>>>> 80 ubequad 0x602c020000000000 : Greenstreet Art drawing
> !:mime image/x-greenstreet-art
> !:ext art
> Furthermore i display an user defined mime type instead of generic
> application/x-ole-storage.
>
> After applying the above mentioned modifications by patch
> file-5.41-ole2compounddocs-art.diff then all my "newer" inspected
> Greenstreet ART drawing examples are now described with more details.
> This now looks with -e cdf option like:
> BCARD2.ART: OLE 2 Compound Document, v3.62, SecID 0x2,
> 2 FAT sectors, Mini FAT start sector 0x4 :
> Greenstreet Art drawing
> CONTENTS-stream.art: data
> POSTER2.ART: OLE 2 Compound Document, v3.62, SecID 0x2,
> 3 FAT sectors, Mini FAT start sector 0x4 :
> Greenstreet Art drawing
> Preview-stream.dib: Device independent bitmap graphic, 100 x 65 x 8,
> image size 6500, resolution 1830 x 1830 px/m
> SLIDE2.ART: OLE 2 Compound Document, v3.62, SecID 0x2,
> 2 FAT sectors, Mini FAT start sector 0x4 :
> Greenstreet Art drawing
> Visio2002Test.vsd: OLE 2 Compound Document, v3.62, SecID 0x2,
> Mini FAT start sector 0x5 : Microsoft
> Visio 2000-2002 Document, stencil or template
>
> I hope my diff file can be applied in future version of file
> utility.
>
> There exist some older ART examples where another format is used. I
> will try to handle this in a future session.
>
> With best wishes,
> Jörg Jenderek
> - --
> Jörg Jenderek
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCY1BZBAAKCRCv8rHJQhrU
> 1mxOAJ9lJVqAB+6fBkJY7HziTF7ac3nBUgCdG+5E+A9kYLehlTelom0MAHU3oWo=
> =tzLX
> -----END PGP SIGNATURE-----
> <trid-v-art.txt.gz><file-5_43-ole2compounddocs-art_diff.DEFANGED-0><file-5_43-ole2compounddocs-art_diff_sig.DEFANGED-1>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20221019/a0902b40/attachment-0001.asc>
More information about the File
mailing list