[File] [PATCH] Magdir/ole2compounddocs Greenstreet ART drawing misidentfied as Microsoft
Jörg Jenderek
joerg.jen.der.ek at gmx.net
Wed Oct 19 20:07:33 UTC 2022
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello,
some days ago i handled some files with ART file name extensions.
Some samples are "newer" Greenstreet Art drawings.
When running file command version 5.41 with -e cdf option on such
examples and related files i get an output like:
BCARD2.ART: OLE 2 Compound Document, v3.62, SecID 0x2,
2 FAT sectors,
Mini FAT start sector 0x4 : Microsoft
CONTENTS-stream.art: data
POSTER2.ART: OLE 2 Compound Document, v3.62, SecID 0x2,
3 FAT sectors,
Mini FAT start sector 0x4 : Microsoft
Preview-stream.dib: Device independent bitmap graphic, 100 x 65 x 8,
image size 6500, resolution 1830 x 1830 px/m
SLIDE2.ART: OLE 2 Compound Document, v3.62, SecID 0x2,
2 FAT sectors,
Mini FAT start sector 0x4 : Microsoft
Visio2002Test.vsd: OLE 2 Compound Document, v3.62, SecID 0x2,
Mini FAT start sector 0x5 : Microsoft
Visio 2000-2002 Document, stencil or template
Furthermore only generic mime type application/octet-stream is
shown with -i and -e cdf option. With option --extension only 3 byte
sequence ??? is shown.
When running file command with -e soft or no extra option for my
examples i get a output like:
BCARD2.ART: Composite Document File V2 Document,
Little Endian, Os 0, Version: 3.95, Title:
greenstreet Draw Art
FileComposite Document File V2 Document,
Cannot read section info
CONTENTS-stream.art: data
POSTER2.ART: Composite Document File V2 Document,
Little Endian, Os 0, Version: 3.95, Title:
greenstreet Draw Art
FileComposite Document File V2 Document,
Cannot read section info
Preview-stream.dib: Device independent bitmap graphic, 100 x 65 x 8,
image size 6500, resolution 1830 x 1830 px/m
SLIDE2.ART: Composite Document File V2 Document,
Little Endian, Os 0, Version: 3.95, Title:
greenstreet Draw Art
FileComposite Document File V2 Document,
Cannot read section info
Visio2002Test.vsd: Composite Document File V2 Document,
Little Endian, Os: Windows, Version 6.1,
Code page: 1252, Author: dtobias,
Name of Creating Application: Microsoft Visio
For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). This identifies
"older" examples like CONTENTS-stream.art as "Greenstreet Art drawing
(old)" by art-gst.trid.xml. The "newer" ART examples with low
priority are described as "Generic OLE2 / Multistream
Compound" by docfile.trid.xml. These samples are also described with
higher rate as "Greenstreet Art drawing" by art-gst-docfile.trid.xml
(See appended trid-v-art.txt.gz).
For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/). This
does describe the newer variants generic as "OLE2 Compound Document
Format" by PUID fmt/111.
Luckily with information given by TrID i get a page about GST ART
on file formats archive team web site. There it is written that
"newer" version is based on Microsoft Compound File format. That
informations are expressed now inside Magdir/ole2compounddocs
by comment lines like:
# URL: http://fileformats.archiveteam.org/wiki/GST_ART
# Reference: http://mark0.net/download/triddefs_xml.7z
# defs/a/art-gst-docfile.trid.xml
There under item "Software & Samples" the Greenstreet Publishing
Suite 99 is mentioned. On that CD-ROM image i found such "new"
variant samples.
The ART examples are recognized as "OLE 2 Compound Document"
by starting bytes (\320\317\021\340\241\261\032\341) at the beginning
inside Magdir/ole2compounddocs.
If sub classification is not possible then it normally prints at the
end after the values section a part starting with phrase ": UNKNOWN".
There at the moment exist 2 unknown branches. On for examples with
non null clsid. That part looked like:
>>88 default x
>>>88 ubequad !0 : UNKNOWN
!:mime application/x-ole-storage
>>>>80 ubequad !0 \b, clsid %#16.16llx
>>>>88 ubequad x \b%16.16llx
I put this displaying part about CDF directory inside sub routine
that start like:
0 name ole2-unknown
> 80 ubequad x : UNKNOWN
!:mime application/x-ole-storage
> 80 ubequad !0 \b, clsid %#16.16llx
>> 88 ubequad x \b%16.16llx
To help "common" user not knowing how to convert GUID i also show the
clsid after in hexadecimal form also in "curly braces" form.
Furthermore i also show second til seventh directory names ( often
useful when inspecting unknown samples) after first entry "Root
Entry" by additional lines like:
>>80 guid x {%s}
>128 lestring16 x with names %.20s
>256 lestring16 x %.20s
>384 lestring16 x %.25s
>512 lestring16 x %.10s
>640 lestring16 x %.10s
>768 lestring16 x %.10s
So i can use this sub routine also in second "unknown" branch where i
replace displaying part by calling sub routine which now becomes like
:
>>>>128 default x
>>>>>0 use ole2-unknown
instead of lines like:
>>>>128 default x : UNKNOWN
>>>>>128 lestring16 x with names %.20s
>>>>>256 lestring16 x %.20s
>>>>>384 lestring16 x %.20s
At this point we have nothing win, but obviously there exist a third
"unknown" branch. Many Microsoft has similar clsid (Or precisely the
second part of GUID is the same). So such Microsoft product from
"Visio 2000-2002" til "PowerPoint 4.0" are handled in a branch that
looks like:
>>88 ubequad 0xc000000000000046 : Microsoft
>>>80 ubequad 0x131a020000000000 Visio 2000-2002
!:ext vsd/vss/vst
...
>>>80 ubequad 0x5148040000000000 PowerPoint 4.0
!:ext ppt
Unfortunately this shared part of clsid is also true for "newer"
Greenstreet ART drawings. So i move phrase ": Microsoft" to second
test lines and add at the end a default clause to match unrecognized
examples by third "unknown" branch. So this now becomes like:
>>88 ubequad 0xc000000000000046
>>>80 ubequad 0x131a020000000000 : Microsoft Visio 2000-2002
!:ext vsd/vss/vst
...
>>>80 ubequad 0x5148040000000000 : Microsoft PowerPoint 4.0
!:ext ppt
>>>80 default x
>>>>0 use ole2-unknown
When now running file command on "newer" ART samples i get output lik
e:
BCARD2.ART: OLE 2 Compound Document, v3.62, SecID 0x2,
2 FAT sectors, Mini FAT start sector 0x4 : UNKNOWN,
clsid 0x602c020000000000c000000000000046
{00022C60-0000-0000-C000-000000000046}
with names CONTENTS Preview.dib
\005SummaryInformation \377\377\3 ! A
POSTER2.ART: OLE 2 Compound Document, v3.62, SecID 0x2,
3 FAT sectors, Mini FAT start sector 0x4 : UNKNOWN,
clsid 0x602c020000000000c000000000000046
{00022C60-0000-0000-C000-000000000046}
with names CONTENTS Preview.dib
\005SummaryInformation \377\377\3 ! A
SLIDE2.ART: OLE 2 Compound Document, v3.62, SecID 0x2,
2 FAT sectors, Mini FAT start sector 0x4 : UNKNOWN,
clsid 0x602c020000000000c000000000000046
{00022C60-0000-0000-C000-000000000046}
with names CONTENTS Preview.dib
\005SummaryInformation \377\377\3 ! A
The information mentioned for "older" also applies to "newer"
variant. On Web page it is written that the "older" part now is
stored as content stream. That name is also shown by above patched
file command. Because the ART samples are OLE2 Compound container we
can inspect such examples by suited tools like Michal Mutl Structured
Storage Viewer for example. There we see that stream and we can save
it as CONTENTS-stream.art for example. This is described by TrID as
"old" variant.
So to catch the ART samples i now only insert suited lines before
default clause. These additional lines look like:
>>>80 ubequad 0x602c020000000000 : Greenstreet Art drawing
!:mime image/x-greenstreet-art
!:ext art
Furthermore i display an user defined mime type instead of generic
application/x-ole-storage.
After applying the above mentioned modifications by patch
file-5.41-ole2compounddocs-art.diff then all my "newer" inspected
Greenstreet ART drawing examples are now described with more details.
This now looks with -e cdf option like:
BCARD2.ART: OLE 2 Compound Document, v3.62, SecID 0x2,
2 FAT sectors, Mini FAT start sector 0x4 :
Greenstreet Art drawing
CONTENTS-stream.art: data
POSTER2.ART: OLE 2 Compound Document, v3.62, SecID 0x2,
3 FAT sectors, Mini FAT start sector 0x4 :
Greenstreet Art drawing
Preview-stream.dib: Device independent bitmap graphic, 100 x 65 x 8,
image size 6500, resolution 1830 x 1830 px/m
SLIDE2.ART: OLE 2 Compound Document, v3.62, SecID 0x2,
2 FAT sectors, Mini FAT start sector 0x4 :
Greenstreet Art drawing
Visio2002Test.vsd: OLE 2 Compound Document, v3.62, SecID 0x2,
Mini FAT start sector 0x5 : Microsoft
Visio 2000-2002 Document, stencil or template
I hope my diff file can be applied in future version of file
utility.
There exist some older ART examples where another format is used. I
will try to handle this in a future session.
With best wishes,
Jörg Jenderek
- --
Jörg Jenderek
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCY1BZBAAKCRCv8rHJQhrU
1mxOAJ9lJVqAB+6fBkJY7HziTF7ac3nBUgCdG+5E+A9kYLehlTelom0MAHU3oWo=
=tzLX
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-art.txt.gz
Type: application/x-gzip
Size: 780 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20221019/f0bc3034/attachment.bin>
-------------- next part --------------
--- file-5.43/magic/Magdir/ole2compounddocs.old 2022-09-13 20:05:40.000000000 +0200
+++ file-5.43/magic/Magdir/ole2compounddocs 2022-10-19 21:41:23.239179400 +0200
@@ -74,2 +74,3 @@
#>68 bequad !0xffffffffffffffff \b, DirIDs %llx
+# NEXT lines for DEBUGGING
# second directory entry name like VisioDocument Control000
@@ -307,13 +308,4 @@
# remaining null clsid
->>>>128 default x : UNKNOWN
-# second directory entry name like VisioDocument Control000
->>>>>128 lestring16 x with names %.20s
-# third directory entry like WordDocument
->>>>>256 lestring16 x %.20s
-# forth
->>>>>384 lestring16 x %.20s
-!:mime application/x-ole-storage
-# according to file version 5.41 with -e soft option
-#!:mime application/CDFV2
-#!:ext ???
+>>>>128 default x
+>>>>>0 use ole2-unknown
# look for known clsid GUID
@@ -322,4 +314,4 @@
# Last update on 10/23/2006 by Lester Hightower, 07/20/2019 by Joerg Jenderek
->>88 ubequad 0xc000000000000046 : Microsoft
->>>80 ubequad 0x131a020000000000 Visio 2000-2002 Document, stencil or template
+>>88 ubequad 0xc000000000000046
+>>>80 ubequad 0x131a020000000000 : Microsoft Visio 2000-2002 Document, stencil or template
!:mime application/vnd.visio
@@ -327,3 +319,3 @@
!:ext vsd/vss/vst
->>>80 ubequad 0x141a020000000000 Visio 2003-2010 Document, stencil or template
+>>>80 ubequad 0x141a020000000000 : Microsoft Visio 2003-2010 Document, stencil or template
!:mime application/vnd.visio
@@ -332,3 +324,3 @@
# URL: http://fileformats.archiveteam.org/wiki/Windows_Installer
->>>80 ubequad 0x84100c0000000000 Windows Installer Package
+>>>80 ubequad 0x84100c0000000000 : Microsoft Windows Installer Package
!:mime application/x-msi
@@ -336,3 +328,3 @@
!:ext msi
->>>80 ubequad 0x86100c0000000000 Windows Installer Patch
+>>>80 ubequad 0x86100c0000000000 : Microsoft Windows Installer Patch
# ??
@@ -343,3 +335,3 @@
# URL: http://fileformats.archiveteam.org/wiki/DOC
->>>80 ubequad 0x0009020000000000 Word 6-95 document or template
+>>>80 ubequad 0x0009020000000000 : Microsoft Word 6-95 document or template
!:mime application/msword
@@ -348,3 +340,3 @@
!:ext doc/dot
->>>80 ubequad 0x0609020000000000 Word 97-2003 document or template
+>>>80 ubequad 0x0609020000000000 : Microsoft Word 97-2003 document or template
!:mime application/msword
@@ -355,3 +347,3 @@
# URL: http://fileformats.archiveteam.org/wiki/Microsoft_Works_Word_Processor
->>>80 ubequad 0x0213020000000000 Works 3-4 document or template
+>>>80 ubequad 0x0213020000000000 : Microsoft Works 3-4 document or template
!:mime application/vnd.ms-works
@@ -362,3 +354,3 @@
# URL: http://fileformats.archiveteam.org/wiki/Microsoft_Works_Database
->>>80 ubequad 0x0313020000000000 Works 3-4 database or template
+>>>80 ubequad 0x0313020000000000 : Microsoft Works 3-4 database or template
!:mime application/vnd.ms-works-db
@@ -370,3 +362,3 @@
# URL: https://en.wikipedia.org/wiki/Microsoft_Excel
->>>80 ubequad 0x1008020000000000 Excel 5-95 worksheet, addin or template
+>>>80 ubequad 0x1008020000000000 : Microsoft Excel 5-95 worksheet, addin or template
!:mime application/vnd.ms-excel
@@ -377,3 +369,3 @@
#
->>>80 ubequad 0x2008020000000000 Excel 97-2003
+>>>80 ubequad 0x2008020000000000 : Microsoft Excel 97-2003
!:mime application/vnd.ms-excel
@@ -393,4 +385,4 @@
# URL: http://fileformats.archiveteam.org/wiki/OLE2
->>>80 ubequad 0x0b0d020000000000 Outlook 97-2003 item
-#>>>80 ubequad 0x0b0d020000000000 Outlook 97-2003 Message
+>>>80 ubequad 0x0b0d020000000000 : Microsoft Outlook 97-2003 item
+#>>>80 ubequad 0x0b0d020000000000 : Microsoft Outlook 97-2003 Message
#!:mime application/vnd.ms-outlook
@@ -399,3 +391,3 @@
# URL: https://wiki.fileformat.com/email/oft/
->>>80 ubequad 0x46f0060000000000 Outlook 97-2003 item template
+>>>80 ubequad 0x46f0060000000000 : Microsoft Outlook 97-2003 item template
#!:mime application/vnd.ms-outlook
@@ -405,3 +397,3 @@
# URL: http://fileformats.archiveteam.org/wiki/PPT
->>>80 ubequad 0x5148040000000000 PowerPoint 4.0 presentation
+>>>80 ubequad 0x5148040000000000 : Microsoft PowerPoint 4.0 presentation
!:mime application/vnd.ms-powerpoint
@@ -410,2 +402,15 @@
!:ext ppt
+# Summary: "newer" Greenstreet Art drawing
+# From: Joerg Jenderek
+# URL: http://fileformats.archiveteam.org/wiki/GST_ART
+# Reference: http://mark0.net/download/triddefs_xml.7z/defs/a/art-gst-docfile.trid.xml
+# Note: called like "Greenstreet Art drawing" by TrID
+# Note: CONTENT stream contains binary part of older versions with phrase GST:ART at offset 16
+# verified by Michal Mutl Structured Storage Viewer `SSView.exe BCARD2.ART`
+>>>80 ubequad 0x602c020000000000 : Greenstreet Art drawing
+#!:mime application/x-ole-storage
+!:mime image/x-greenstreet-art
+!:ext art
+>>>80 default x
+>>>>0 use ole2-unknown
#??
@@ -663,4 +668,6 @@
>>88 default x
-# GRR: check again for non null clsid because wrong when called by indirect directive
->>>88 ubequad !0 : UNKNOWN
+>>>0 use ole2-unknown
+# display information about directory for not detected CDF files
+0 name ole2-unknown
+>80 ubequad x : UNKNOWN
# https://reposcope.com/mimetype/application/x-ole-storage
@@ -670,4 +677,17 @@
#!:ext ???
->>>>80 ubequad !0 \b, clsid %#16.16llx
->>>>88 ubequad x \b%16.16llx
-
+>80 ubequad !0 \b, clsid %#16.16llx
+>>88 ubequad x \b%16.16llx
+# converted hexadecimal format to standard GUUID notation
+>>80 guid x {%s}
+# second directory entry name like VisioDocument Control000
+>128 lestring16 x with names %.20s
+# third directory entry like WordDocument Preview.dib
+>256 lestring16 x %.20s
+# forth like \005SummaryInformation
+>384 lestring16 x %.25s
+# 5th
+>512 lestring16 x %.10s
+# 6th
+>640 lestring16 x %.10s
+# 7th
+>768 lestring16 x %.10s
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.43-ole2compounddocs-art.diff.sig
Type: application/octet-stream
Size: 2010 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20221019/f0bc3034/attachment.obj>
More information about the File
mailing list