[File] [PATCH] of Magdir/msdos,printer for DOS EPS Binary File; - duplicates + *.eps *.ept

Christos Zoulas christos at zoulas.com
Sun Jan 22 15:03:16 UTC 2023


Committed, thanks!

christos

> On Jan 13, 2023, at 8:43 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hello,
> 
> some days ago i want to install an Intel Based WIFI card.
> Under directory "c:\Program Files\Intel\WiFi\" in sub directory
> ProfileImporters i found samples with suffix EPI ( like MurocImp.epi
> M100Imp.epi SbrngImp.epi). For that suffix i expect Encapsulated
> PostScript files.
> 
> When running file command version 5.44 on such examples and some
> other test samples with -k option i get an output like:
> 
> M100Imp.epi:                  data
> SOCCER.WMF:                   Windows metafile data
> abydos.tiff:                  TIFF image data, little-endian,
> 			      direntries=17, height=600, bps=28946,
> 			      compression=deflate,
> 			      PhotometricInterpretation=RGB,
> 			      orientation=upper-left\012- , width=800
> drawX8-ps2wmf.eps:            DOS EPS Binary File
> 			      Postscript starts at byte 30
> 			      length 37402
> 			      Metafile starts at byte 37432
> 			      length 452
> 			      DOS EPS Binary File
> 			      Postscript starts at byte 30
> 			      length 37402
> 			      Metafile starts at byte 37432
> 			      length 452
> 			      OpenPGP Secret Key
> dreieck.ept:                  DOS EPS Binary File
> 			      Postscript starts at byte 30
> 			      length 6367
> 			      TIFF starts at byte 6397
> 			      length 12910
> 			      DOS EPS Binary File
> 			      Postscript starts at byte 30
> 			      length 6367
> 			      TIFF starts at byte 6397
> 			      length 12910
> 			      OpenPGP Secret Key
> example.eps:                  DOS EPS Binary File
> 			      Postscript starts at byte 43350
> 			      length 263893
> 			      TIFF starts at byte 30
> 			      length 43320
> 			      DOS EPS Binary File
> 			      Postscript starts at byte 43350
> 			      length 263893
> 			      TIFF starts at byte 30
> 			      length 43320
> 			      OpenPGP Secret Key
> fmt-122-signature-id-174.eps: DOS EPS Binary File
> 			      Postscript starts at byte 1397760293
> 			      length 1868841261
> 			      Metafile starts at byte 841835874
> 			      length 1159737390
> 			      TIFF starts at byte 759583568
> 			      length 221392433
> 			      DOS EPS Binary File
> 			      Postscript starts at byte 1397760293
> 			      length 1868841261
> 			      Metafile starts at byte 841835874
> 			      length 1159737390
> 			      TIFF starts at byte 759583568
> 			      length 221392433
> 			      OpenPGP Secret Key
> fmt-123-signature-id-178.eps: DOS EPS Binary File
> 			      Postscript starts at byte 1397760293
> 			      length 1868841261
> 			      Metafile starts at byte 841835874
> 			      length 1159737390
> 			      TIFF starts at byte 759583568
> 			      length 221261362
> 			      DOS EPS Binary File
> 			      Postscript starts at byte 1397760293
> 			      length 1868841261
> 			      Metafile starts at byte 841835874
> 			      length 1159737390
> 			      TIFF starts at byte 759583568
> 			      length 221261362
> 			      OpenPGP Secret Key
> fmt-124-signature-id-180.eps: DOS EPS Binary File
> 			      Postscript starts at byte 1397760293
> 			      length 1868841261
> 			      Metafile starts at byte 858613090
> 			      length 1159737390
> 			      TIFF starts at byte 759583568
> 			      length 221261363
> 			      DOS EPS Binary File
> 			      Postscript starts at byte 1397760293
> 			      length 1868841261
> 			      Metafile starts at byte 858613090
> 			      length 1159737390
> 			      TIFF starts at byte 759583568
> 			      length 221261363
> 			      OpenPGP Secret Key
> 
> Furthermore with -i option expected image/x-eps for DOS EPS Binary
> samples is shown, but with --extension for such samples only ??? is
> displayed.
> 
> For comparison reason i run other utilities. The file identifier
> tool TrID  (see http://mark0.net/soft-trid-e.html) describes such
> DOS EPS Binary examples with low priority as "Adobe Encapsulated
> PostScript" by definition eps-adobe.trid.xml.
> Most of the real DOS EPS ( that is excluding DROID test samples
> fmt-122-signature-id-174.eps fmt-123-signature-id-178.eps
> fmt-124-signature-id-180.eps) are described with highest priority as
> "Encapsulated PostScript binary (with TIFF preview)" by
> eps-tiff.trid.xml. The few real real DOS EPS not described by this
> definition ( like sample drawX8-ps2wmf.eps) are described with
> highest rate as "Encapsulated PostScript binary (with WMF preview)"
> by eps-wmf.trid.xml (See appended trid-v-DOS-EPS.txt.gz).
> 
> DROID (Digital Record and Object Identification) is a software tool
> developed by The National Archives of UK to perform automated batch
> identification of file formats. See
> 	https://digital-preservation.github.io/droid/
> According to that tool the samples are described as "Encapsulated
> PostScript File Format" with mime type application/postscript. The
> suffix EPS is here accepted whereas EPT is not accepted. The sub
> classification with version "1.2" happens by by PUID fmt/122.  The
> sub classification with version "2.0" happens by by PUID fmt/123. The
> sub classification with version "3" happens by by PUID fmt/124 (See
> appended droid-DOS-EPS.csv.gz)
> 
> I also run the command line tool of XnView graphic tool by command
> line like:
> 	nconvert -info *.EP?
> Here the real samples with TIFF images are described as Format TIFF
> and name epsp. For samples with WMF like drawX8-ps2wmf.eps it
> failed (See appended nconvert-info-DOS-EPS.txt.gz).
> 
> I also run the command line tool of ImageMagick graphic tool by
> command line like:
> 	identify -verbose *
> Here all real DOS binary samples are described as EPT (Encapsulated
> PostScript with TIFF preview) even the samples with WMF preview
> (See appended identify-verbose-DOS-EPS.txt.gz)
> 
> First we see that we get duplicate messages, because in Magdir/msdos
> and Magdir/printer in principal the same recognition lines are found
> starting with line:
> 0	belong		0xC5D0D3C6	DOS EPS Binary File
> 
> So first i delete concerning lines inside Magdir/msdos by patch
> file-5.44-msdos-eps.diff to remove duplicate messages.
> 
> In Magdir/printer the mime type line missing. In Magdir/msdos the
> next lines look like:
> !:mime	image/x-eps
>> 4	long		>0		Postscript starts at byte %d
>>> 8	long		>0		length %d
>>>> 12	long		>0		Metafile starts at byte %d
>>>>> 16	long		>0		length %d
>>>> 20	long		>0		TIFF starts at byte %d
>>>>> 24	long		>0		length %d
> 
> Encapsulated PostScript can contain a TIFF preview. Such variants
> are described by TrID as "Encapsulated PostScript binary (with TIFF
> preview)" by eps-tiff.trid.xml. If stored offset and length of this
> embedded image is not zero then print this information with beginning
> phrase "TIFF starts". This is not always true. The sample can be
> corrupted. It is also false for the DROID test samples
> fmt-122-signature-id-174.eps fmt-123-signature-id-178.eps
> fmt-124-signature-id-180.eps. These are used by DROID tool to
> recognize Encapsulated PostScript samples and contains just the
> header bytes. With the help of the offset i can jump to that location
> and inspected these parts via indirect call by file command again. So
> these concerning  magic lines now becomes like:
>>>>> 20   long            >0              at byte %d
> !:ext	eps/ept
>>>>>> 24  long            >0              length %d
>>>>>>> (20.l)	indirect		x
> So for the DROID samples nothing is shown where for real samples
> additional information about embedded TIFF is shown by Magdir/images.
> For this variant also suffix EPT instead of standard EPS is used.
> 
> If Encapsulated PostScript contain no TIFF preview it contains
> instead a Windows Metafile (*.WMF) and the values for TIFF are nil.
> Such variants are described by TrID as "Encapsulated PostScript
> binary (with WMF preview)" by eps-wmf.trid.xml. If stored offset
> and length of this embedded image is not zero print this
> information with
> beginning phrase "Metafile starts". This is not always true. The
> sample can be corrupted. It is also false for the DROID test samples.
> These are used by DROID tool to recognize Encapsulated PostScript
> samples and contains just the header bytes. With the help of the
> offset i can jump to that location and inspected this part via
> indirect call by file command again. So these concerning  magic lines
> now becomes like:
>>>>> 12   long            >0              at byte %d
> !:ext	eps
>>>>> 16  long            >0              length %d
>>>>>> (12.l)	indirect		x
> So for the DROID samples nothing is shown where for real samples
> additional information about embedded WMF is shown by Magdir/msdos.
> For this variant apparently only EPS suffix is used.
> 
> In test lines "long" is used as integer type. This is true for me
> on my machines which are all little endian, but i think the above
> test lines fail if running file command on big endian machines. So
> i believe the right expression must use something like "lelong".
> Unfortunately i have no machine with big endian. So maybe somebody
> can check this?
> 
> Then do the same procedure for the embedded Postscript parts which
> often comes direct after header. So often (850/857 on my systems )
> this offset is 30 or 32, but i also found few samples with values
> like 2788 10644 43350 71828. So the postscript part now becomes like:
>>> 4      long            >0              at byte %d
>>>> 8     long            >0              length %d
>>>>> (4.l)	indirect		x
> I get here calling indirect of ./printer phrase like "length 263893
> PostScript document text" when adding 1 space character after
> length value. In the TIFF parts i get little "strange" phrase like
> "length 43320\012- TIFF image data," In the WMF parts i get little
> "strange" phrase like "length 452\012- Windows metafile". So maybe
> this seems to be a BUG in file command.
> 
> The DROID samples are no real Encapsulated Postscript. So i add
> additional test right after first test magic. So i check for the
> existence of content after header. I do this by second test line like
> :
>> 32	ulelong		>0		DOS EPS Binary File
> In version 5.44 some other variants do not work like:
>> 32	long		!0		DOS EPS Binary File
>> 32	lelong		!0		DOS EPS Binary File
> 
> After applying the above mentioned modifications by patch
> file-5.44-msdos-eps.diff and file-5.44-printer-eps.diff and using
> Magdir/images for TIFF parts then i get an output like:
> 
> M100Imp.epi:                  data
> SOCCER.WMF:                   Windows metafile
> abydos.tiff:                  TIFF image data, little-endian,
> 			      direntries=17, height=600, bps=28946,
> 			      compression=deflate,
> 			      PhotometricInterpretation=RGB,
> 			      orientation=upper-left, width=800
> drawX8-ps2wmf.eps:            DOS EPS Binary File
> 			      at byte 30
> 			      length 37402
> 			      PostScript document text
> 			      conforming DSC level 3.0, type EPS,
> 			      Level 2
> 			      at byte 37432
> 			      length 452
> 			      \012- Windows metafile
> dreieck.ept:                  DOS EPS Binary File
> 			      at byte 30
> 			      length 6367
> 			      PostScript document text
> 			      conforming DSC level 3.0, type EPS,
> 			      Level 1
> 			      at byte 6397
> 			      length 12910
> 			      \012- TIFF image data, big-endian,
> 			      direntries=20, height=25, bps=16,
> 			      compression=none,
> 			      PhotometricInterpretation=BlackIsZero,
> 			      orientation=upper-left, width=100
> example.eps:                  DOS EPS Binary File
> 			      at byte 43350
> 			      length 263893
> 			      PostScript document text
> 			      conforming DSC level 3.1, type EPS,
> 			      Level 2
> 			      at byte 30
> 			      length 43320
> 			      \012- TIFF image data, little-endian,
> 			      direntries=16, height=708, bps=8,
> 			      compression=LZW,
> 			      PhotometricInterpretation=RGB Palette,
> 			      width=498
> fmt-122-signature-id-174.eps: ISO-8859 text, with CR line terminators
> fmt-123-signature-id-178.eps: ISO-8859 text, with CR line terminators
> fmt-124-signature-id-180.eps: ISO-8859 text, with CR line terminators
> 
> I hope my diff files can be applied in future version of
> file utility.
> 
> With best wishes
> Jörg Jenderek
> - --
> Jörg Jenderek
> 
> 
> 
> 
> 
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
> 
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCY8IIuwAKCRCv8rHJQhrU
> 1jYLAKDaw2FMZAkVLj1GkQFQOGtGzBvTLACg3stQpM6+xrPSBGDI8fy37SdITK8=
> =UJvN
> -----END PGP SIGNATURE-----
> <trid-v-DOS-EPS.txt.gz><droid-DOS-EPS.csv.gz><nconvert-info-DOS-EPS.txt.gz><identify-verbose-DOS-EPS.txt.gz><file-5_44-msdos-eps_diff.DEFANGED-558><file-5_44-msdos-eps_diff_sig.DEFANGED-559><file-5_44-printer-eps_diff.DEFANGED-560><file-5_44-printer-eps_diff_sig.DEFANGED-561>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20230122/44a0aa9b/attachment.asc>


More information about the File mailing list