[File] [PATCH] Magdir/msdos for Windows metafile; improvements

Jörg Jenderek joerg.jen.der.ek at gmx.net
Fri Jan 20 00:42:55 UTC 2023


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

some days ago i send patch concerning inside Magdir/printer DOS EPS
Binary. These contains sometimes Windows metafile. So i looked at
such images which usually have file name suffix WMF.

When running running file command version 5.44 on a few thousands
WMF images i get at first glance good looking output like:

SPA_FLAG.wmf:                    Windows metafile
TestPalette.wmf:                 Windows metafile
abydos.wmf:                      Windows metafile
corel3wmf.wmf:                   Windows metafile
example.wmf:                     Windows metafile
exttextout-2.wmf:                Windows metafile
hardcopy-windows-meta.wmf:       Windows metafile
ofz35149-1.wmf:                  Windows metafile
test-type2.wmf:                  Windows metafile
x-fmt-119-signature-id-1228.wmf: Windows metafile
x-fmt-119-signature-id-609.wmf:  Windows metafile

With option -i correct mime type image/wmf is shown.
Furthermore with --extension option wmf is displayed.

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). Surprisingly there i
get a little bit more differentiation. Many examples are described
with highest priority as "Windows Metafile" by wmf.trid.xml. These
examples are also described with lower priority as "Aldus Placeable
Metafile" with wrong suffix APM by apm.trid.xml.
But a few examples like x-fmt-119-signature-id-1228.wmf, example.wmf
and corel3wmf.wmf are described only as "Windows Metafile (old Win
3.x format)" by wmf-16.trid.xml. The type 2 sample test-type2.wmf is
not recognized and described as "Unknown!". Correctly failed samples
exttextout-2.wmf and ofz35149-1.wmf (found in source of
libreoffice-7.3.2.2) are also not recognized and described as
"Unknown!" (See appended trid-v-wmf.txt.gz).

For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/). Here
most WMF examples are described as "Windows Metafile Image" with mime
type image/wmf by PUID x-fmt/119. What is described by TrID as
"Windows Metafile (old Win 3.x format)" is described in detail list
mode as "Windows Metafile Image without Placeable File Header". The
other variant described by TrID is described here in detail list mode
as "Windows Metafile Image with Placeable File Header". The type 2
sample is here also not recognized and failed samples ofz35149-1.wmf
and exttextout-2.wmf (part of test data inside libreoffice-7.3.2.2)
are here also not described (See appended droid-wmf.csv.gz).

I also run the command line tool of XnView graphic tool by command
line like:
	nconvert -info *.wmf
For the DROID samples like x-fmt-119-signature-id-1228.wmf and
x-fmt-119-signature-id-609.wmf this fails because these files are
used by DROID tool to recognize WMF images and contain therefore only
some leading bytes. Correctly is fails on samples like
exttextout-2.wmf and ofz35149-1.wmf (found in source of
libreoffice-7.3.2.2). The real WMF samples described by TrID as
"Windows Metafile (old Win 3.x format)" are here described as Format
"Windows metafile" and name wmf. The other variant detected by TrID
is described here as Format "Windows Placeable metafile" and name wmf
(See appended nconvert-info-wmf.txt.gz).

I also run the command line tool of ImageMagick graphic tool by
command line like:
	identify -verbose *.wmf
Here all real WMF images are described as WMF (Windows Meta File).
For the DROID samples like x-fmt-119-signature-id-1228.wmf and
x-fmt-119-signature-id-609.wmf this also fails. It correctly fails
also on LibreOffice test samples exttextout-2.wmf and
ofz35149-1.wmf (See appended identify-verbose-wmf.txt.gz).

With the help of TrID output i found page on file formats archive
team web site. There also a link to official Microsoft document
[MS-WMF].pdf about Windows Metafile Format is listed. That
informations are expressed inside Magdir/msdos by additional comment
lines like:
# URL: 	http://fileformats.archiveteam.org/wiki/Windows_Metafile
# Reference:	https://winprotocoldoc.blob.core.windows.net/
#		productionwindowsarchives/MS-WMF/%5bMS-WMF%5d.pdf
# Reference:	http://mark0.net/download/triddefs_xml.7z
#		defs/w/wmf.trid.xml
#		defs/w/wmf-16.trid.xml
There you also find links to examples that can be downloaded.

The current description happens in side Magdir/msdos. There exist 3
similar entries for such WMF images.

The second and third entry consist of just 6 lines like:
0	string/b	\002\000\011\000	Windows metafile
!:mime	image/wmf
!:ext	wmf
0	string/b	\001\000\011\000	Windows metafile
!:mime	image/wmf
!:ext	wmf

Because all 3 entries share some common parts i put most displaying
part (type, size, objects) in a subroutine that starts like:
 0	name		wmf-head
 >0	uleshort	!0x0001			\b, type %#x
 >2	uleshort*2	!18			\b, header size %u

According to documentation at offset 0 the MetafileType is stored as
2 byte little endian integer. Only 2 values are allowed. The value 1
(MEMORYMETAFILE) means Metafile is stored in memory. The value 2
(DISKMETAFILE) means the Metafile is stored on disk. In all my
thousands inspected examples i found no type 2 example. Neither TrID
nor DROID knows this type. I am no WMF graphic expert. So i do not
know if second entry exist in reality. Maybe this is very unlikely.
So i keep second entry. According to documentation at offset 2 the
HeaderSize is stored as 2 byte little endian integer. That is the
number of words in header record. It is not explicitly written but
according to TrID and DROID this value seems to to be always 9. Or
expressed in another way the header has always a size of 18 bytes.

So i use this knowledge to check for content after header too skip
DROID sample x-fmt-119-signature-id-1228.wmf in third variant (with
type=1=MEMORYMETAFILE and valid HeaderSize 9). So this now becomes
like:
 0	string/b	\001\000\011\000
 >18	ulelong		>0			Windows metafile
 >>0	use		wmf-head
And second entry for "unlikely" type 2 now becomes like:
 0	string/b	\002\000\011\000	Windows metafile
 >0	use		wmf-head

According to documentation at offset 4 the MetafileVersion is stored
as 2 byte little endian integer. Only 2 values are allowed. The value
0100h (=METAVERSION100) means device-independent bitmaps (DIBs) are
not supported. The value 300h (=METAVERSION300) means DIBs are
supported. In most case i get value 300h. So show this value by lines
like:
 >4	uleshort	=0x0100		\b, DIBs not supported
 >4	uleshort	=0x0300
 >4	default		x		\b, version
 >>4	uleshort	x		%#x
In libreoffice test samples i get here wrong 2020h for ofz35149-1.wmf
and 3a02h for exttextout-2.wmf. The DROID and TrID tool checks this
version information and thereby skips these "bad" LibreOffice
examples. So i use this as additional test for the first variant.
There a META_PLACEABLE Record (22 bytes with Aldus Placeable Metafile
signature) comes before the META_HEADER Record. So for this variant
the magic lines now becomes like:
 0	string/b	\327\315\306\232
 >26	uleshort&0xFDff	=0x0100			Windows metafile
 >>22	use		wmf-head

After the version field the size of WMF file is stored as 4 byte
little endian integer. This is the number of words in the entire
metafile. When multiplying this by 2 you get the size of WMF in
bytes. So that information is now shown by additional lines inside
sub routine like:
 >6	ulelong		x		\b, size %u words
 !:mime	image/wmf
 !:ext	wmf
 #>6	ulelong*2	x		\b, size %u bytes

Afterwards the number of graphics objects is stored as 2 byte little
endian integer NumberOfObjects. That information is shown by line lik
e:
 >10	uleshort	x			\b, %u objects
Most real WMF images apparently contain only a few dozens of objects.
The highest value i observed was 110 for PERSGRID.WMF example. For
"bad" examples i get here again "unlikely high" values (like 252
objects in exttextout-2.wmf and 8224 objects in ofz35149-1.wmf).
Interestingly i also get value 0 in hardcopy-windows-meta.wmf
example. Apparently this contains no vector graphic element and the
graphic is stored as a kind of "bitmap".

Afterwards the size of the largest record in the metafile in words is
stored as 4 byte little endian integer MaxRecord. That information is
shown by line like:
 >12	ulelong		x		\b, largest record size %#x
For real examples is get here values like:
	78h b0h 1f4h 310h 63fh 1e0022h 3fcc21h
For "bad" samples i get here also "unlikely values (like 0x20202020
for ofz35149-1.wmf or 0 for exttextout-2.wmf).

As last field in header the NumberOfMembers is stored as 2 byte
little endian integer. That information is shown by line like:
 >16	uleshort	!0			\b, %u members
According to Microsoft i should get here value nil. But in few libre
office examples in get here low positive values (like 5 in
TestBitBltStretchBlt.wmf and 13 in TestPalette.wmf). So this violates
current specification, but graphic tools XnView and ImageMagick
Display are able to show this images, whereas IrfanView fails on this
example. And again for "bad" examples i get here "unlikely high"
values ( like 4254 for bitcount-1.wmf, 8224 for ofz5942-1.wmf and
56832 for exttextout-2.wmf).

In first variant the META_PLACEABLE Record contain some additional
information that can be useful. After the start magic the
BoundingBox is stored with 8 bytes. That is the rectangle in the
playback context. That information is shown by lines like:
 >>6	leshort		x			\b, bounding box (%d
 >>8	leshort		x			\b,%d
 >>10	leshort		x			/ %d
 >>12	leshort		x			\b,%d)
These dimensions shown here are also reported as Geometry by
ImageMagick. So for abydos.wmf example i get bounding box (0,0 /
9999,7499). This information is shown by ImageMagick as Geometry
9999x7499+0+0. The XnView tool shows here other values. So for
example hardcopy-windows-meta.wmf with bounding box (0,0 / 1280,1024)
it reports Width 1280 and Height 1023. Again for "bad" samples i get
here nonsense like bounding box (-21589,-21589 / -21589,-21589) for
x-fmt-119-signature-id-609.wmf.

After applying the above mentioned modifications by patch
file-5.44-msdos-wmf.diff then all my inspected WMF images are still
described, but with more details and inspected "failed" examples are
not misidentified any more. If there exist more "bad" WMF samples
these can be skipped by additional tests for "unlikely" header
field values. This now looks like:

SPA_FLAG.wmf:                    Windows metafile
				 , bounding box (-4,-4 / 2275,1714)
				 , dpi 1440, checksum 0x5ce0
				 , DIBs not supported
				 , size 7436 words, 10 objects
				 , largest record size 0x1f4
TestPalette.wmf:                 Windows metafile
				 , size 77 words, 4 objects
				 , largest record size 0x9
				 , 13 members
abydos.wmf:                      Windows metafile
				 , bounding box (0,0 / 9999,7499)
				 , dpi 1200, checksum 0x69e5
				 , size 4488 words, 4 objects
				 , largest record size 0x63f
corel3wmf.wmf:                   Windows metafile
				 , size 7175 words, 6 objects
				 , largest record size 0x310
example.wmf:                     Windows metafile
				 , size 8661824 words, 8 objects
				 , largest record size 0x3fcc21
exttextout-2.wmf:                data
hardcopy-windows-meta.wmf:       Windows metafile
				 , bounding box (0,0 / 1280,1024)
				 , dpi 96, checksum 0x5671
				 , size 1966140 words, 0 objects
				 , largest record size 0x1e0022
ofz35149-1.wmf:                  data
test-type2.wmf:                  Windows metafile
				 , type 0x2
				 , size 7175 words, 6 objects
				 , largest record size 0x310
x-fmt-119-signature-id-1228.wmf: data
x-fmt-119-signature-id-609.wmf:  data

I hope my diff file can be applied in future version of file utility.

With best wishes
Jörg Jenderek
- --
Jörg Jenderek
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCY8njjgAKCRCv8rHJQhrU
1iGhAJ4oSODc71xGr6EwSgKgzbQUPYjm0gCgh1W/ucpJ6Zm0DrpnWDo1WK9AxUQ=
=5fyF
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-wmf.txt.gz
Type: application/x-gzip
Size: 709 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230120/426880e5/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: droid-wmf.csv.gz
Type: application/x-gzip
Size: 586 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230120/426880e5/attachment-0005.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: identify-verbose-wmf.txt.gz
Type: application/x-gzip
Size: 3247 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230120/426880e5/attachment-0006.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nconvert-info-wmf.txt.gz
Type: application/x-gzip
Size: 718 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230120/426880e5/attachment-0007.bin>
-------------- next part --------------
--- file-5.44/magic/Magdir/msdos.old	2022-12-26 19:00:48.000000000 +0100
+++ file-5.44/magic/Magdir/msdos	2023-01-20 01:17:10.884011000 +0100
@@ -1484,13 +1484,80 @@
 
 # Windows Metafile .WMF
-0	string/b	\327\315\306\232	Windows metafile
-!:mime	image/wmf
-!:ext	wmf
+# URL: 		http://fileformats.archiveteam.org/wiki/Windows_Metafile
+#		http://en.wikipedia.org/wiki/Windows_Metafile
+# Reference:	https://winprotocoldoc.blob.core.windows.net/productionwindowsarchives/MS-WMF/%5bMS-WMF%5d.pdf
+#		http://mark0.net/download/triddefs_xml.7z/defs/w/wmf.trid.xml
+# Note:		called "Windows Metafile" by TrID and
+#		verified by ImageMagick `identify -verbose *.wmf` as WMF (Windows Meta File)
+# META_PLACEABLE Record (Aldus Placeable Metafile signature)
+0	string/b	\327\315\306\232
+# Note:		called "Windows Metafile Image with Placeable File Header" by DROID via PUID x-fmt/119
+#		and verified by XnView `nconvert -info abydos.wmf SPA_FLAG.wmf hardcopy-windows-meta.wmf` as "Windows Placeable metafile"
+# skip failed libreoffice-7.3.2.2 ofz35149-1.wmf with invalid version 2020h and exttextout-2.wmf with invalid version 3a02h
+# and x-fmt-119-signature-id-609.wmf without version instead of 0100h=METAVERSION100 or 0300h=METAVERSION300
+>26	uleshort&0xFDff	=0x0100			Windows metafile
+# HWmf; resource handle to the metafile; When the metafile is on disk, this field MUST contain 0
+# seems to be always true but in failed samples 2020h ofz35149-1.wmf 56f8h exttextout-2.wmf
+>>4	uleshort	!0			\b, resource handle %#x
+# BoundingBox; the rectangle in the playback context measured in logical units for displaying
+# sometimes useful like: hardcopy-windows-meta.wmf (0,0 / 1280,1024)
+# but garbage in x-fmt-119-signature-id-609.wmf (-21589,-21589 / -21589,-21589)
+#>>6	ubequad		x			\b, bounding box %#16.16llx
+# Left; x-coordinate of the upper-left corner of the rectangle
+>>6	leshort		x			\b, bounding box (%d
+# Top; y-coordinate upper-left corner
+>>8	leshort		x			\b,%d
+# Right; x-coordinate lower-right corner
+>>10	leshort		x			/ %d
+# Bottom; y-coordinate lower-right corner
+>>12	leshort		x			\b,%d)
+# Inch; number of logical units per inch like: 72 96 575 576 1000 1200 1439 1440 2540
+>>14	uleshort	x			\b, dpi %u
+# Reserved; field is not used and MUST be set to 0; but ababababh in x-fmt-119-signature-id-609.wmf
+>>16	ulelong		!0			\b, reserved %#x
+# Checksum; checksum for the previous 10 words
+>>20	uleshort	x			\b, checksum %#x
+# META_HEADER Record after META_PLACEABLE Record
+>>22	use		wmf-head
+# GRR:		no example for type 2 (DISKMETAFILE) variant found under few thousands WMF
 0	string/b	\002\000\011\000	Windows metafile
+>0	use		wmf-head
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/w/wmf-16.trid.xml
+# Note:		called "Windows Metafile (old Win 3.x format)" by TrID and
+#		"Windows Metafile Image without Placeable File Header" by DROID via PUID x-fmt/119
+#		verified by XnView `nconvert -info *.wmf` as Windows metafile
+# variant with type=1=MEMORYMETAFILE and valid HeaderSize 9
+0	string/b	\001\000\011\000
+# skip DROID x-fmt-119-signature-id-1228.wmf by looking for content after header (18 bytes=2*011)
+>18	ulelong		>0			Windows metafile
+# GRR: in version 5.44 unequal and not endian variant not working!
+#>18	ulelong		!0			THIS_SHOULD_NOT_HAPPEN
+#>18	long		!0			THIS_SHOULD_NOT_HAPPEN
+>>0	use		wmf-head
+#	display information of Windows metafile header (type, size, objects)
+0	name		wmf-head
+# MetafileType: 0001h=MEMORYMETAFILE~Metafile is stored in memory 0002h=DISKMETAFILE~Metafile is stored on disk
+>0	uleshort	!0x0001			\b, type %#x
+# HeaderSize; the number of WORDs in header record; seems to be always 9 (18 bytes)
+>2	uleshort*2	!18			\b, header size %u
+# MetafileVersion: 0100h=METAVERSION100~DIBs (device-independent bitmaps) not supported 0300h=METAVERSION300~DIBs are supported
+# but in failed samples 2020h ofz35149-1.wmf 3a02h exttextout-2.wmf
+>4	uleshort	=0x0100			\b, DIBs not supported 
+>4	uleshort	=0x0300
+#>4	uleshort	=0x0300			\b, DIBs supported
+# this should not happen!
+>4	default		x			\b, version
+>>4	uleshort	x			%#x
+# Size; the number of WORDs in the entire metafile
+>6	ulelong	x				\b, size %u words
+#>6	ulelong*2	x			\b, size %u bytes
 !:mime	image/wmf
 !:ext	wmf
-0	string/b	\001\000\011\000	Windows metafile
-!:mime	image/wmf
-!:ext	wmf
+# NumberOfObjects: the number of graphics objects like: 0 hardcopy-windows-meta.wmf 1 2 3 4 5 6 7 8 9 12 13 14 16 17 20 27 110 PERSGRID.WMF
+>10	uleshort	x			\b, %u objects
+# MaxRecord: the size of the largest record in the metafile in WORDs like: 78h b0h 1f4h 310h 63fh 1e0022h 3fcc21h
+>12	ulelong		x			\b, largest record size %#x
+# NumberOfMembers: It SHOULD be 0x0000, but 5 TestBitBltStretchBlt.wmf 13 TestPalette.wmf and in failed samples 4254 bitcount-1.wmf 8224 ofz5942-1.wmf 56832 exttextout-2.wmf
+>16	uleshort	!0			\b, %u members
 
 #tz3 files whatever that is (MS Works files)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.44-msdos-wmf.diff.sig
Type: application/octet-stream
Size: 2274 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230120/426880e5/attachment-0001.obj>


More information about the File mailing list