[File] [PATCH] of Magdir/msdos,printer for DOS EPS Binary File; - duplicates + *.ept

Jörg Jenderek joerg.jen.der.ek at gmx.net
Fri Jan 24 21:26:49 UTC 2020


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

some weeks ago i had to handle some Encapsulated PostScript files.
When running file command version 5.38 on such examples and some
of the embedded preview images (*.wmf *.tif) with -k option i get an
output like:

A-wmf.eps:                     DOS EPS Binary File
	Postscript starts at byte 30 length 31933
	Metafile starts at byte 31963 length 382\012-
	DOS EPS Binary File
	Postscript starts at byte 30 length 31933
	Metafile starts at byte 31963 length 382
A-wmf.wmf:                     Windows metafile
Bitmap_VS_SVG.ept:             DOS EPS Binary File
	Postscript starts at byte 30 length 3258544
	TIFF starts at byte 3258574 length 526158\012-
	DOS EPS Binary File
	Postscript starts at byte 30 length 3258544
	TIFF starts at byte 3258574 length 526158
Bitmap_VS_SVG.tif:             TIFF image data,
	big-endian,
	direntries=20, height=512, bps=8, compression=none,
	PhotometricIntepretation=RGB Palette,
	orientation=upper-left, width=512
compact_flash_unmount-svg.ept: DOS EPS Binary File
	Postscript starts at byte 30 length 1868969
	TIFF starts at byte 1868999 length 463074\012-
	DOS EPS Binary File
	Postscript starts at byte 30 length 1868969
	TIFF starts at byte 1868999 length 463074
compact_flash_unmount-svg.tif: TIFF image data,
	little-endian,
	direntries=19, height=480, bps=8, compression=none,
	PhotometricIntepretation=RGB Palette, name=II*,
	orientation=upper-left, width=480
corruption-NoTIFF.bin:         data
corruption.eps:                DOS EPS Binary File
	Postscript starts at byte 32 length 271569
	TIFF starts at byte 271601 length 412925\012-
	DOS EPS Binary File
	Postscript starts at byte 32 length 271569
	TIFF starts at byte 271601 length 412925
corruption.ps:                 PostScript document text
	conforming DSC level 3.0, type EPS
example.eps:                   DOS EPS Binary File
	Postscript starts at byte 43350 length 263893
	TIFF starts at byte 30 length 43320\012-
	DOS EPS Binary File
	Postscript starts at byte 43350 length 263893
	TIFF starts at byte 30 length 43320
example.tif:                   TIFF image data,
	little-endian,
	direntries=16, height=708, bps=8, compression=LZW,
	PhotometricIntepretation=RGB Palette, width=498
sample.eps:                    DOS EPS Binary File
	Postscript starts at byte 32 length 485753
	TIFF starts at byte 485785 length 288509\012-
	DOS EPS Binary File
	Postscript starts at byte 32 length 485753
	TIFF starts at byte 485785 length 288509
sample.tif:                    TIFF image data,
	little-endian,
	direntries=13, height=294, bps=8, compression=none,
	PhotometricIntepretation=RGB Palette, width=460
usnavy.eps:                    DOS EPS Binary File
	Postscript starts at byte 30 length 62205
	TIFF starts at byte 62235 length 22512\012-
	DOS EPS Binary File
	Postscript starts at byte 30 length 62205
	TIFF starts at byte 62235 length 22512
usnavy.tif:                    TIFF image data,
	big-endian,
	direntries=12

Furthermore with -i option application/octet-stream or image/x-eps is
shown, with --extension only ??? is displayed. And with --apple
option wrong UNKNUNKN is shown.

First we see that we get duplicate messages, because in Magdir/msdos
and Magdir/printer in principal the same recognition lines are found
starting with line:
 0	belong		0xC5D0D3C6	DOS EPS Binary File

So first i delete concerning lines inside msdos by patch
file-5.38-msdos-eps.diff to remove duplicated messages.

Information about that file format can be found on file formats
archive team web site. This is now expressed by comment lines like:
# URL: fileformats.archiveteam.org/wiki/Encapsulated_PostScript
# Reference: www.fileformat.info/format/eps/egff.htm

For comparison reason i run other utilities.
The file identifier tool TrID  (see
http://mark0.net/soft-trid-e.html) describes such Encapsulated
PostScript examples as "Encapsulated PostScript binary" by definition
eps-adobe.trid.xml.

DROID (Digital Record and Object Identification) is a software tool
developed by The National Archives of UK to perform automated batch
identification of file formats. See
	https://digital-preservation.github.io/droid/
According to that tool the samples are described as "Encapsulated
PostScript File Format" by PUID fmt/122 and fmt/124.

The identify command line tool of ImageMagick graphic software {
found at https://imagemagick.org/ } recognize such examples as EPT
Format (Encapsulated PostScript with TIFF preview).

I also do not like the "DOS" phrase used by file command. In computer
ancient times on classic Mac OS it was possible to put an preview
image in the resource fork, but on DOS computers this concept does
not exist. So a binary format was "invented" to put plain PostScript
text together with binary TIFF or WMF preview image in one file.
Nowadays nearly nobody is using DOS but that binary format still can
be read/written by software like CorelDRAW and ImageMagick.

According to reference site and by looking at other description the
phrase "DOS EPS Binary File" is not well suited. Better would be a
phrase like "Encapsulated PostScript Binary" or "Encapsulated
PostScript with TIFF or WMF preview".
But at the moment i keep old message text, which i now expressed by
line
 0       belong          0xC5D0D3C6      DOS EPS Binary
Afterwards display instead possible mime type application/postscript
the current used one by line
 !:mime	image/x-eps
The Apple type is now shown by line
 !:apple	????EPSF

According to formats page on imagemagick.org for Encapsulated
PostScript with TIFF preview sometimes the file name extension ept
instead usual eps extension is used. By looking at non null TIFF
preview offset stored at position 20 these facts are now shown by
lines like
 >20	long		=0		File
 !:ext	eps
 >20	long		!0		File
 !:ext	eps/ept

In current magic file for TIFF preview information is shown by lines
 >>>20   long            >0              TIFF starts at byte %d
 >>>>24  long            >0              length %d
That procedure has some disadvantages.

So when you extract the preview images, then the description texts
are not consistent and must be synchronised manually. So in EPS
wrapped it is called "TIFF" and the stand alone image is called "TIFF
image data". More obvious is this fact for WMF variant. In EPS
wrapped it is called "Metafile" and in stand alone "Windows metafile"
.
Furthermore it does not check or verifies the existence of the
preview image. So it is not recognised that in example corruption.eps
the embedded preview corruption-NoTIFF.bin is not a TIFF image.

To overcome these disadvantage i use file command again to check for
embedded parts by indirect calls of Magdir/images and Magdir/msdos.
Unfortunately this method has one disadvantage. The pointer
expression only works for offsets lower than maximum FILE_BYTES_MAX
defined in src/file.h, which is normally 1 MiB (0x100000).
So TIFF preview is now handled by lines like
 >>24	long		>0		\b, %d bytes
 >>>20	long		>0		at %d
 >>>20	long		>0x0FffFF	TIFF image
 >>>20	long		<0x100000
 >>>>(20.l)	indirect	x	\b
Another advantage is that more information is now also shown. So we
see that usnavy.eps contains an big endian TIFF image whereas the
sample.eps contains a big endian TIFF preview.

It would be fine if a feature can be implemented to use configured
limits inside comparison expression like
 >>>20	long		<FILE_BYTES_MAX

Then i do the same procedure for the WMF preview and the PostScript
text part. So i can be seen that compact_flash_unmount-svg.ept
contains a DOS EPS Binary File inside a DOS EPS Binary File.

After applying the above mentioned modifications by patch
file-5.38-msdos-eps.diff and file-5.38-printer-eps.diff then
duplicate messages are vanished and i get a more precise output like:

A-wmf.eps:                     DOS EPS Binary File,
	31933 bytes at 30 PostScript document text
	conforming DSC level 3.0, type EPS, Level 1,
	382 bytes at 31963 Windows metafile
A-wmf.wmf:                     Windows metafile
Bitmap_VS_SVG.ept:             DOS EPS Binary File,
	3258544 bytes at 30 PostScript document text
	conforming DSC level 3.0, type EPS, Level 1,
	526158 bytes at 3258574 TIFF image
Bitmap_VS_SVG.tif:             TIFF image data,
	big-endian,
	direntries=20, height=512, bps=8, compression=none,
	PhotometricIntepretation=RGB Palette,
	orientation=upper-left, width=512
compact_flash_unmount-svg.ept: DOS EPS Binary File,
	1868969 bytes at 30 DOS EPS Binary File,
	1405865 bytes at 30 PostScript document text
	conforming DSC level 3.0, type EPS, Level 1,
	463074 bytes at 1405895 TIFF image,
	463074 bytes at 1868999 TIFF image
compact_flash_unmount-svg.tif: TIFF image data,
	little-endian,
	direntries=19, height=480, bps=8, compression=none,
	PhotometricIntepretation=RGB Palette, name=II*,
	orientation=upper-left, width=480
corruption-NoTIFF.bin:         data
corruption.eps:                DOS EPS Binary File,
	271569 bytes at 32 PostScript document text
	conforming DSC level 3.0, type EPS,
	412925 bytes at 271601
corruption.ps:                 PostScript document text
	conforming DSC level 3.0, type EPS
example.eps:                   DOS EPS Binary File,
	263893 bytes at 43350 PostScript document text
	conforming DSC level 3.1, type EPS, Level 2,
	43320 bytes at 30 TIFF image data,
	little-endian,
	direntries=16, height=708, bps=8, compression=LZW,
	PhotometricIntepretation=RGB Palette, width=498
example.tif:                   TIFF image data,
	little-endian,
	direntries=16, height=708, bps=8, compression=LZW,
	PhotometricIntepretation=RGB Palette, width=498
Grafik3.eps:                   DOS EPS Binary File,
	7821857 bytes at 30 PostScript document text
	conforming DSC level 3.0, type EPS, Level 2,
	2430240 bytes at 7821887 Windows metafile
Grafik3.wmf:                   Windows metafile
Grafik5.eps:                   DOS EPS Binary File,
	7821857 bytes at 30 PostScript document text
	conforming DSC level 3.0, type EPS, Level 3,
	483144 bytes at 7821887 TIFF image
Grafik5.tif:                   TIFF image data,
	little-endian,
	direntries=32, height=601, bps=8, compression=none,
	PhotometricIntepretation=RGB Palette,
	description=image description,
	orientation=upper-left, width=800
sample.eps:                    DOS EPS Binary File,
	485753 bytes at 32 PostScript document text
	conforming DSC level 3.1, type EPS, Level 2,
	288509 bytes at 485785 TIFF image data,
	little-endian,
	direntries=13, height=294, bps=8, compression=none,
	PhotometricIntepretation=RGB Palette, width=460
sample.tif:                    TIFF image data,
	little-endian,
	direntries=13, height=294, bps=8, compression=none,
	PhotometricIntepretation=RGB Palette, width=460
usnavy.eps:                    DOS EPS Binary File,
	62205 bytes at 30 PostScript document text
	conforming DSC level 2.0, type EPS,
	22512 bytes at 62235 TIFF image data,
	big-endian,
	direntries=12
usnavy.tif:                    TIFF image data,
	big-endian,
	direntries=12

I hope my 2 diff files can be applied in future version of
file utility.

With best wishes
Jörg Jenderek
- --
Jörg Jenderek








-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCXitg6wAKCRCv8rHJQhrU
1kriAJwOlwri35jUU6j1plcZbaYNRwqNJACfYWV+sKdV1CctLDykPD6WA08Zrxc=
=T9Yi
-----END PGP SIGNATURE-----
-------------- next part --------------
--- file-5.38/magic/Magdir/msdos.old	2019-10-18 16:16:18 +0000
+++ file-5.38/magic/Magdir/msdos	2020-01-06 23:42:10 +0000
@@ -1143,17 +1143,6 @@
 >0x187	search/0xB55	AUTOEXECBAT\ 4.0\0	\b +AUTOEXEC.BAT
 #>>&06		string	x			\b:%s
 
-# DOS EPS Binary File Header
-# From: Ed Sznyter <ews at Black.Market.NET>
-0	belong		0xC5D0D3C6	DOS EPS Binary File
-!:mime	image/x-eps
->4	long		>0		Postscript starts at byte %d
->>8	long		>0		length %d
->>>12	long		>0		Metafile starts at byte %d
->>>>16	long		>0		length %d
->>>20	long		>0		TIFF starts at byte %d
->>>>24	long		>0		length %d
-
 # TNEF magic From "Joomy" <joomy at se-ed.net>
 # Microsoft Outlook's Transport Neutral Encapsulation Format (TNEF)
 0	lelong		0x223e9f78	TNEF
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: file-5.38-printer-eps.diff
URL: <https://mailman.astron.com/pipermail/file/attachments/20200124/a81c95d8/attachment.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.38-msdos-eps.diff.sig
Type: application/octet-stream
Size: 95 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20200124/a81c95d8/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.38-printer-eps.diff.sig
Type: application/octet-stream
Size: 95 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20200124/a81c95d8/attachment-0001.obj>


More information about the File mailing list