[File] [PATCH] of Magdir/msdos,printer for DOS EPS Binary File; - duplicates + *.eps *.ept

Jörg Jenderek joerg.jen.der.ek at gmx.net
Sat Jan 14 01:43:24 UTC 2023


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

some days ago i want to install an Intel Based WIFI card.
Under directory "c:\Program Files\Intel\WiFi\" in sub directory
ProfileImporters i found samples with suffix EPI ( like MurocImp.epi
M100Imp.epi SbrngImp.epi). For that suffix i expect Encapsulated
PostScript files.

When running file command version 5.44 on such examples and some
other test samples with -k option i get an output like:

M100Imp.epi:                  data
SOCCER.WMF:                   Windows metafile data
abydos.tiff:                  TIFF image data, little-endian,
			      direntries=17, height=600, bps=28946,
			      compression=deflate,
			      PhotometricInterpretation=RGB,
			      orientation=upper-left\012- , width=800
drawX8-ps2wmf.eps:            DOS EPS Binary File
			      Postscript starts at byte 30
			      length 37402
			      Metafile starts at byte 37432
			      length 452
			      DOS EPS Binary File
			      Postscript starts at byte 30
			      length 37402
			      Metafile starts at byte 37432
			      length 452
			      OpenPGP Secret Key
dreieck.ept:                  DOS EPS Binary File
			      Postscript starts at byte 30
			      length 6367
			      TIFF starts at byte 6397
			      length 12910
			      DOS EPS Binary File
			      Postscript starts at byte 30
			      length 6367
			      TIFF starts at byte 6397
			      length 12910
			      OpenPGP Secret Key
example.eps:                  DOS EPS Binary File
			      Postscript starts at byte 43350
			      length 263893
			      TIFF starts at byte 30
			      length 43320
			      DOS EPS Binary File
			      Postscript starts at byte 43350
			      length 263893
			      TIFF starts at byte 30
			      length 43320
			      OpenPGP Secret Key
fmt-122-signature-id-174.eps: DOS EPS Binary File
			      Postscript starts at byte 1397760293
			      length 1868841261
			      Metafile starts at byte 841835874
			      length 1159737390
			      TIFF starts at byte 759583568
			      length 221392433
			      DOS EPS Binary File
			      Postscript starts at byte 1397760293
			      length 1868841261
			      Metafile starts at byte 841835874
			      length 1159737390
			      TIFF starts at byte 759583568
			      length 221392433
			      OpenPGP Secret Key
fmt-123-signature-id-178.eps: DOS EPS Binary File
			      Postscript starts at byte 1397760293
			      length 1868841261
			      Metafile starts at byte 841835874
			      length 1159737390
			      TIFF starts at byte 759583568
			      length 221261362
			      DOS EPS Binary File
			      Postscript starts at byte 1397760293
			      length 1868841261
			      Metafile starts at byte 841835874
			      length 1159737390
			      TIFF starts at byte 759583568
			      length 221261362
			      OpenPGP Secret Key
fmt-124-signature-id-180.eps: DOS EPS Binary File
			      Postscript starts at byte 1397760293
			      length 1868841261
			      Metafile starts at byte 858613090
			      length 1159737390
			      TIFF starts at byte 759583568
			      length 221261363
			      DOS EPS Binary File
			      Postscript starts at byte 1397760293
			      length 1868841261
			      Metafile starts at byte 858613090
			      length 1159737390
			      TIFF starts at byte 759583568
			      length 221261363
			      OpenPGP Secret Key

Furthermore with -i option expected image/x-eps for DOS EPS Binary
samples is shown, but with --extension for such samples only ??? is
displayed.

For comparison reason i run other utilities. The file identifier
tool TrID  (see http://mark0.net/soft-trid-e.html) describes such
DOS EPS Binary examples with low priority as "Adobe Encapsulated
PostScript" by definition eps-adobe.trid.xml.
Most of the real DOS EPS ( that is excluding DROID test samples
fmt-122-signature-id-174.eps fmt-123-signature-id-178.eps
fmt-124-signature-id-180.eps) are described with highest priority as
"Encapsulated PostScript binary (with TIFF preview)" by
eps-tiff.trid.xml. The few real real DOS EPS not described by this
definition ( like sample drawX8-ps2wmf.eps) are described with
highest rate as "Encapsulated PostScript binary (with WMF preview)"
by eps-wmf.trid.xml (See appended trid-v-DOS-EPS.txt.gz).

DROID (Digital Record and Object Identification) is a software tool
developed by The National Archives of UK to perform automated batch
identification of file formats. See
	https://digital-preservation.github.io/droid/
According to that tool the samples are described as "Encapsulated
PostScript File Format" with mime type application/postscript. The
suffix EPS is here accepted whereas EPT is not accepted. The sub
classification with version "1.2" happens by by PUID fmt/122.  The
sub classification with version "2.0" happens by by PUID fmt/123. The
sub classification with version "3" happens by by PUID fmt/124 (See
appended droid-DOS-EPS.csv.gz)

I also run the command line tool of XnView graphic tool by command
line like:
	nconvert -info *.EP?
Here the real samples with TIFF images are described as Format TIFF
and name epsp. For samples with WMF like drawX8-ps2wmf.eps it
failed (See appended nconvert-info-DOS-EPS.txt.gz).

I also run the command line tool of ImageMagick graphic tool by
command line like:
	identify -verbose *
Here all real DOS binary samples are described as EPT (Encapsulated
PostScript with TIFF preview) even the samples with WMF preview
(See appended identify-verbose-DOS-EPS.txt.gz)

First we see that we get duplicate messages, because in Magdir/msdos
and Magdir/printer in principal the same recognition lines are found
starting with line:
 0	belong		0xC5D0D3C6	DOS EPS Binary File

So first i delete concerning lines inside Magdir/msdos by patch
file-5.44-msdos-eps.diff to remove duplicate messages.

In Magdir/printer the mime type line missing. In Magdir/msdos the
next lines look like:
!:mime	image/x-eps
> 4	long		>0		Postscript starts at byte %d
>> 8	long		>0		length %d
>>> 12	long		>0		Metafile starts at byte %d
>>>> 16	long		>0		length %d
>>> 20	long		>0		TIFF starts at byte %d
>>>> 24	long		>0		length %d

Encapsulated PostScript can contain a TIFF preview. Such variants
are described by TrID as "Encapsulated PostScript binary (with TIFF
preview)" by eps-tiff.trid.xml. If stored offset and length of this
embedded image is not zero then print this information with beginning
phrase "TIFF starts". This is not always true. The sample can be
corrupted. It is also false for the DROID test samples
fmt-122-signature-id-174.eps fmt-123-signature-id-178.eps
fmt-124-signature-id-180.eps. These are used by DROID tool to
recognize Encapsulated PostScript samples and contains just the
header bytes. With the help of the offset i can jump to that location
and inspected these parts via indirect call by file command again. So
these concerning  magic lines now becomes like:
 >>>>20   long            >0              at byte %d
 !:ext	eps/ept
 >>>>>24  long            >0              length %d
 >>>>>>(20.l)	indirect		x
So for the DROID samples nothing is shown where for real samples
additional information about embedded TIFF is shown by Magdir/images.
For this variant also suffix EPT instead of standard EPS is used.

If Encapsulated PostScript contain no TIFF preview it contains
instead a Windows Metafile (*.WMF) and the values for TIFF are nil.
Such variants are described by TrID as "Encapsulated PostScript
binary (with WMF preview)" by eps-wmf.trid.xml. If stored offset
and length of this embedded image is not zero print this
information with
beginning phrase "Metafile starts". This is not always true. The
sample can be corrupted. It is also false for the DROID test samples.
These are used by DROID tool to recognize Encapsulated PostScript
samples and contains just the header bytes. With the help of the
offset i can jump to that location and inspected this part via
indirect call by file command again. So these concerning  magic lines
now becomes like:
 >>>>12   long            >0              at byte %d
 !:ext	eps
 >>>>16  long            >0              length %d
 >>>>>(12.l)	indirect		x
So for the DROID samples nothing is shown where for real samples
additional information about embedded WMF is shown by Magdir/msdos.
For this variant apparently only EPS suffix is used.

In test lines "long" is used as integer type. This is true for me
on my machines which are all little endian, but i think the above
test lines fail if running file command on big endian machines. So
i believe the right expression must use something like "lelong".
Unfortunately i have no machine with big endian. So maybe somebody
can check this?

Then do the same procedure for the embedded Postscript parts which
often comes direct after header. So often (850/857 on my systems )
this offset is 30 or 32, but i also found few samples with values
like 2788 10644 43350 71828. So the postscript part now becomes like:
 >>4      long            >0              at byte %d
 >>>8     long            >0              length %d
 >>>>(4.l)	indirect		x
I get here calling indirect of ./printer phrase like "length 263893
PostScript document text" when adding 1 space character after
length value. In the TIFF parts i get little "strange" phrase like
"length 43320\012- TIFF image data," In the WMF parts i get little
"strange" phrase like "length 452\012- Windows metafile". So maybe
this seems to be a BUG in file command.

The DROID samples are no real Encapsulated Postscript. So i add
additional test right after first test magic. So i check for the
existence of content after header. I do this by second test line like
:
 >32	ulelong		>0		DOS EPS Binary File
In version 5.44 some other variants do not work like:
 >32	long		!0		DOS EPS Binary File
 >32	lelong		!0		DOS EPS Binary File

After applying the above mentioned modifications by patch
file-5.44-msdos-eps.diff and file-5.44-printer-eps.diff and using
Magdir/images for TIFF parts then i get an output like:

M100Imp.epi:                  data
SOCCER.WMF:                   Windows metafile
abydos.tiff:                  TIFF image data, little-endian,
			      direntries=17, height=600, bps=28946,
			      compression=deflate,
			      PhotometricInterpretation=RGB,
			      orientation=upper-left, width=800
drawX8-ps2wmf.eps:            DOS EPS Binary File
			      at byte 30
			      length 37402
			      PostScript document text
			      conforming DSC level 3.0, type EPS,
			      Level 2
			      at byte 37432
			      length 452
			      \012- Windows metafile
dreieck.ept:                  DOS EPS Binary File
			      at byte 30
			      length 6367
			      PostScript document text
			      conforming DSC level 3.0, type EPS,
			      Level 1
			      at byte 6397
			      length 12910
			      \012- TIFF image data, big-endian,
			      direntries=20, height=25, bps=16,
			      compression=none,
			      PhotometricInterpretation=BlackIsZero,
			      orientation=upper-left, width=100
example.eps:                  DOS EPS Binary File
			      at byte 43350
			      length 263893
			      PostScript document text
			      conforming DSC level 3.1, type EPS,
			      Level 2
			      at byte 30
			      length 43320
			      \012- TIFF image data, little-endian,
			      direntries=16, height=708, bps=8,
			      compression=LZW,
			      PhotometricInterpretation=RGB Palette,
			      width=498
fmt-122-signature-id-174.eps: ISO-8859 text, with CR line terminators
fmt-123-signature-id-178.eps: ISO-8859 text, with CR line terminators
fmt-124-signature-id-180.eps: ISO-8859 text, with CR line terminators

I hope my diff files can be applied in future version of
file utility.

With best wishes
Jörg Jenderek
- --
Jörg Jenderek





-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCY8IIuwAKCRCv8rHJQhrU
1jYLAKDaw2FMZAkVLj1GkQFQOGtGzBvTLACg3stQpM6+xrPSBGDI8fy37SdITK8=
=UJvN
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-DOS-EPS.txt.gz
Type: application/x-gzip
Size: 874 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230114/ab4a7e52/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: droid-DOS-EPS.csv.gz
Type: application/x-gzip
Size: 621 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230114/ab4a7e52/attachment-0005.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nconvert-info-DOS-EPS.txt.gz
Type: application/x-gzip
Size: 694 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230114/ab4a7e52/attachment-0006.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: identify-verbose-DOS-EPS.txt.gz
Type: application/x-gzip
Size: 25697 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230114/ab4a7e52/attachment-0007.bin>
-------------- next part --------------
--- file-5.44/magic/Magdir/msdos.old	2022-12-26 19:00:48.000000000 +0100
+++ file-5.44/magic/Magdir/msdos	2023-01-14 02:00:48.686144700 +0100
@@ -1680,19 +1680,8 @@
 #>>&06		string	x			\b:%s
 >0x187	search/0xB55	AUTOEXECBAT\ 4.0\0	\b +AUTOEXEC.BAT
 #>>&06		string	x			\b:%s
 
-# DOS EPS Binary File Header
-# From: Ed Sznyter <ews at Black.Market.NET>
-0	belong		0xC5D0D3C6	DOS EPS Binary File
-!:mime	image/x-eps
->4	long		>0		Postscript starts at byte %d
->>8	long		>0		length %d
->>>12	long		>0		Metafile starts at byte %d
->>>>16	long		>0		length %d
->>>20	long		>0		TIFF starts at byte %d
->>>>24	long		>0		length %d
-
 # Norton Guide (.NG , .HLP) files added by Joerg Jenderek from source NG2HTML.C
 # of http://www.davep.org/norton-guides/ng2h-105.tgz
 # https://en.wikipedia.org/wiki/Norton_Guides
 0	string		NG\0\001
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.44-msdos-eps.diff.sig
Type: application/octet-stream
Size: 650 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230114/ab4a7e52/attachment-0002.obj>
-------------- next part --------------
--- file-5.44/magic/Magdir/printer.old	2022-12-26 19:00:48.000000000 +0100
+++ file-5.44/magic/Magdir/printer	2023-01-14 02:28:58.268096200 +0100
@@ -32,9 +32,38 @@
 # From: Ed Sznyter <ews at Black.Market.NET>
-0       belong          0xC5D0D3C6      DOS EPS Binary File
->4      long            >0              Postscript starts at byte %d
->>8     long            >0              length %d
->>>12   long            >0              Metafile starts at byte %d
+# Update:	Joerg Jenderek
+# URL:		http://fileformats.archiveteam.org/wiki/Encapsulated_PostScript
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/eps-adobe.trid.xml
+# Note:		called "Encapsulated PostScript binary" by TrID and 
+#		verified partly by ImageMagick `identify -verbose *` as EPT (Encapsulated PostScript with TIFF preview)
+0       belong          0xC5D0D3C6
+# skip DROID fmt-122-signature-id-174.eps fmt-123-signature-id-178.eps fmt-124-signature-id-180.eps
+# by looking for content after header
+# GRR: in version 5.44 unequal and not endian variant not working!
+>32	ulelong		>0		DOS EPS Binary File
+!:mime	image/x-eps
+# TODO: check that "long" is false on big endian machines
+# Postscript often (850/857) comes after header; so values like: 30 32 or 2788 10644 43350 71828  
+>>4      long            >0              at byte %d
+# 1 space char after length value to get phrase like "length 263893 PostScript document text"
+>>>8     long            >0              length %d 
+# PostScript document text handled by ./printer
+>>>>(4.l)	indirect		x
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/e/eps-wmf.trid.xml
+# Note:		called "Encapsulated PostScript binary (with WMF preview)" by TrID
+#		verified partly by XnView `nconvert -info *.EP?` as TIFF epsp
+>>>>12   long            >0               at byte %d
+!:ext	eps
+# GRR: in file version 5.44 calling indirect of ./msdos produce phrase like "length 452\012- Windows metafile"
 >>>>16  long            >0              length %d
->>>20   long            >0              TIFF starts at byte %d
->>>>24  long            >0              length %d
+# Windows metafile data handled by ./msdos
+>>>>>(12.l)	indirect		x
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/e/eps-tiff.trid.xml
+# Note:		called "Encapsulated PostScript binary (with TIFF preview)" by TrID
+>>>>20   long            >0              at byte %d
+# For the variant with the TIFF preview image sometimes the file extension ept is used
+!:ext	eps/ept
+# GRR: in file version 5.44 calling indirect of ./images produce phrase like "length 43320\012- TIFF image data,"
+>>>>>24  long            >0              length %d
+# TIFF image data handled by ./images
+>>>>>>(20.l)	indirect		x
 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.44-printer-eps.diff.sig
Type: application/octet-stream
Size: 1235 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230114/ab4a7e52/attachment-0003.obj>


More information about the File mailing list