[File] [PATCH] of Magdir/images for Netpbm images *.ppm *.pam *.pbm ...; XV thumbnail *.p7

Jörg Jenderek joerg.jen.der.ek at gmx.net
Mon Aug 3 19:10:13 UTC 2020


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,
some days ago i Netpbm graphic images with file name extension like
ppm and pam.
When running file command version 5.39 on such examples and other
similar files i get an output like:

gpa-logo.ppm:                   Netpbm image data, size =
				230 x 110, rawbits, pixmap
ico64x01.pnm:                   Netpbm image data, size =
				64 x 64, rawbits, pixmap
input_p2.pgm:                   Netpbm image data, size =
				70 x 46, greymap, ASCII text
input_p4.pbm:                   Netpbm image data, size =
				70 x 46, rawbits, bitmap
input_p5.pgm:                   Netpbm image data, size =
				70 x 46, rawbits, greymap
input_p7.p7:                    XV thumbnail image data
				XV "thumbnail file" (icon) data
				Netpbm PAM image file
joerg.xpm:                      XV thumbnail image data
				XV "thumbnail file" (icon) data
				Netpbm PAM image file
logo_linux_mono.pbm:            Netpbm image data, size =
				80 x 80, bitmap, ASCII text
marb18.jpg:                     XV thumbnail image data
				XV "thumbnail file" (icon) data
				Netpbm PAM image file ,
				ISO-8859 text, with very long lines
MARBLES.PPM:                    Netpbm image data, size =
				1419 x 1001, rawbits, pixmap
P7.txt:                         Netpbm PAM image file
photo100.tif:                   XV thumbnail image data
				XV "thumbnail file" (icon) data
				Netpbm PAM image file
reference.pnm:                  Netpbm image data, size =
				256 x 256, rawbits, pixmap
sample_1920×1280.pam:           Netpbm PAM image file
T-Online.gif:                   XV thumbnail image data
				XV "thumbnail file" (icon) data
				Netpbm PAM image file ,
ufplogo256.pbm:                 Netpbm image data, size =
				219 x 173, bitmap
				, ASCII text
x-fmt-164-signature-id-583.pbm: , bitmap, ASCII text

With --extension option only ??? is displayed and some mime types are
wrong.

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). This list the used
file name extension and often with -v option the related URL
pointing to used file format information.

We get duplicate messages because Magdir/images contains 2 similar
magic lines like
 0	string P7\ 332		XV thumbnail image data
 0      string P7\ 332		XV "thumbnail file" (icon) data

So i removed the second one. Furthermore i display a user defined
mime type and possible file name extensions according to
file formats archive team web site by additional lines
 !:mime	image/x-xv-thumbnail
 !:ext	p7/gif/tif/xpm/jpg

XV software generated thumbnail version of an image with the same
name in the ‘.xvpics’ subdirectory. So we often get well known
graphic extensions like gif, tif and so on.

Furthermore the XV thumbnails are misidentified as Netpbm PAM image
because the magic for that type use only 16 bits for recognition by
line like
 0	string		P7		Netpbm PAM image file

So i add additional tests. According to file formats archive team web
site a newline (0x0A) is following. That would be expressed by line
like:
 >2	ubyte		=0x0A
But neither TrID nor DROID tool use this as test condition. In older
Netpbm format also other "white" character are used for separating.
So there may exist software which does not use that strict condition.
So instead i use a relaxed magic line like
 >2	ubyte		!0xAB
By this line the misidentified DROID fmt-405-signature-id-589.pam is
skipped. For control reason i check for this newline and just show
for unexpected images this fact by informational magic line like
 >>>2	ubyte		!0x0A	\b, 0x%x at offset 2 instead new line

Now i look how the other software recognize such Netpbm PAM images.
DROID looks for WIDTH and ENDHDR keywords.
TrID also check for 4 additional keywords HEIGHT, DEPTH, TUPLTYPE
and MAXVAL. I only check for keyword WIDTH. That was sufficient for
me to avoid misidentifications. For other Netpbm formats the image
geometry is shown by phrase like ", size = n x m". So do this also
for Netpbm PAM images by lines like
 >>3	search/256/b	WIDTH		Netpbm PAM image file, size =
 >>>&1	string		x		%s
 >>>3	search/256/b	HEIGHT		x
 >>>>&1	string		x		%s
I also show file name extension and correct mime type instead wrong
"image/x-portable-pixmap" by lines like:
 !:ext	pam
 !:mime	image/x-portable-arbitrarymap

For the greymap Netpbm image instead false mime type
image/x-portable-greymap according to documentation
the correct name is American spelled. This is now expressed by line:
 !:mime	image/x-portable-graymap

The DROID x-fmt-164-signature-id-583.pbm is misidentified by magic
lines like
 0	search/1	P1
 >0	regex/4		P1[\040\t\f\r\n]
 >>0	use		netpbm
 >>0	string		x	\b, bitmap
 !:strength + 65
 !:mime	image/x-portable-bitmap

The first 2 lines now becomes
 0	search/1	P1
 >2	regex/2		[\040\t\f\r\n]
The test for starting 2 bytes P1 is already done by first magic
test line. So in my opinion inside the regular expression the test
for P1 is not needed any more and second line just test for white
space after this 2 byte magic.

The DROID x-fmt-164-signature-id-583.pbm with ten 0 digits is
skipped by additional third test line like
 >>3	string		!000000000
The following displaying part with file name extension now becomes
like:
 >>>0	use		netpbm
 >>>0	string		x	\b, bitmap
 #!:strength + 65
 !:mime	image/x-portable-bitmap
 !:ext	pbm
Adding 65 to strength so that Netpbm images comes before "x86 boot
sector" or "DOS/MBR boot sector" identified by ./filesystems is
probably not needed because files are different so far as i known.
DOS boot sector always start with a jump instruction at the the
beginning. That is in hexadecimal EB or E9. I also do not know any
MBR boot sector starting with ASCII character P followed by a digit.
So i changed line with strength to a comment line.

Furthermore many ASCII Netpbm images contains a comment line with
creator name for example. So check on second line for character
(#=0x23) starting a comment line and then display comment by
additional lines like:
 >>>3	ubyte		=0x23
 >>>>4	string		x	%s

Furthermore i now show file name extension for all Netpbm images by
additional lines starting with "!:ext".

After applying the above mentioned modifications by patch
file-5.39-images-netpbm.diff then i get a more precise output
without duplicates like:

abydos.pam:                     Netpbm PAM image file
				, size =
				800 x 600
crash-1.pbm:                    Netpbm image data, size =
				30000000000000000000000000000000 x 1,
				pixmap, ASCII text
fmt-405-signature-id-589.pam:   ISO-8859 text,
				with no line terminators
gpa-logo.ppm:                   Netpbm image data, size =
				230 x 110, rawbits, pixmap
ico64x01.pnm:                   Netpbm image data, size =
				64 x 64, rawbits, pixmap
input_p2.pgm:                   Netpbm image data, size =
				70 x 46, greymap, ASCII text
input_p4.pbm:                   Netpbm image data, size =
				70 x 46, rawbits, bitmap
input_p5.pgm:                   Netpbm image data, size =
				70 x 46, rawbits, greymap
input_p7.p7:                    XV thumbnail image data
joerg.xpm:                      XV thumbnail image data
logo_linux_mono.pbm:            Netpbm image data, size =
				80 x 80, bitmap
				Standard black and white Linux logo,
				ASCII text
marb18.jpg:                     XV thumbnail image data
MARBLES.PPM:                    Netpbm image data, size =
				1419 x 1001, rawbits, pixmap
P7.txt:                         ASCII text,
				with CRLF line terminators
photo100.tif:                   XV thumbnail image data
reference.pnm:                  Netpbm image data, size =
				256 x 256, rawbits, pixmap
sample_1920×1280.pam:           Netpbm PAM image file
				, size =
				1920 x 1280
T-Online.gif:                   XV thumbnail image data
ufplogo256.pbm:                 Netpbm image data, size =
				219 x 173, bitmap
				CREATOR: pbmtovbm version 1.99,
				ASCII text
x-fmt-164-signature-id-583.pbm: ASCII text

I hope my diff file can be applied in future version of
file utility.

With best wishes
Jörg Jenderek
- --
Jörg Jenderek












-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCXyhhCwAKCRCv8rHJQhrU
1iGJAKDXvwUnetfyR9ZES8jmK3jCYdpTsQCcCup8231rn+bnnXDUN1LRo4MliUk=
=gHwj
-----END PGP SIGNATURE-----
-------------- next part --------------
--- file-5.39/magic/Magdir/images.old	2020-05-31 10:34:40 +0000
+++ file-5.39/magic/Magdir/images	2020-08-03 19:02:34 +0000
@@ -170,4 +170,7 @@
 # PBMPLUS images
+# URL: 		https://en.wikipedia.org/wiki/Netpbm
 # The next byte following the magic is always whitespace.
-# strength is changed to try these patterns before "x86 boot sector"
+# adding 65 to strength so that Netpbm images comes before "x86 boot sector" or
+# "DOS/MBR boot sector" identified by ./filesystems is probably not needed
+# because files are different
 0	name		netpbm
@@ -178,7 +181,14 @@
 0	search/1	P1
->0	regex/4		P1[\040\t\f\r\n]
->>0	use		netpbm
->>0	string		x	\b, bitmap
-!:strength + 65
+# test for whitespace after 2 byte magic
+>2	regex/2		[\040\t\f\r\n]
+# skip DROID x-fmt-164-signature-id-583.pbm with ten 0 digits
+>>3	string		!000000000
+>>>0	use		netpbm
+>>>0	string		x	\b, bitmap
+#!:strength + 65
 !:mime	image/x-portable-bitmap
+!:ext	pbm
+# check for character # starting a comment line
+>>>3	ubyte		=0x23
+>>>>4	string		x	%s
 
@@ -188,4 +198,6 @@
 >>0	string		x	\b, greymap
-!:strength + 65
-!:mime	image/x-portable-greymap
+#!:strength + 65
+# american spelling gray
+!:mime	image/x-portable-graymap
+!:ext	pgm
 
@@ -195,4 +207,5 @@
 >>0	string		x	\b, pixmap
-!:strength + 65
+#!:strength + 65
 !:mime	image/x-portable-pixmap
+!:ext	ppm
 
@@ -202,4 +215,5 @@
 >>0	string		x	\b, rawbits, bitmap
-!:strength + 65
+#!:strength + 65
 !:mime	image/x-portable-bitmap
+!:ext	pbm
 
@@ -209,4 +223,5 @@
 >>0	string		x	\b, rawbits, greymap
-!:strength + 65
+#!:strength + 65
 !:mime	image/x-portable-greymap
+!:ext	pgm
 
@@ -216,7 +231,21 @@
 >>0	string		x	\b, rawbits, pixmap
-!:strength + 65
+#!:strength + 65
 !:mime	image/x-portable-pixmap
+!:ext	ppm/pnm
 
-0	string		P7		Netpbm PAM image file
-!:mime	image/x-portable-pixmap
+# URL: 		https://en.wikipedia.org/wiki/Netpbm#PAM_graphics_format
+# Reference:	http://fileformats.archiveteam.org/wiki/Portable_Arbitrary_Map
+# Update:	Joerg Jenderek
+0	string		P7
+# skip DROID fmt-405-signature-id-589.pam by looking for character like New Line 
+>2	ubyte		!0xAB
+#>2	ubyte		=0x0A
+>>3	search/256/b	WIDTH		Netpbm PAM image file, size =
+!:mime	image/x-portable-arbitrarymap
+!:ext	pam
+>>>&1	string		x		%s
+>>>3	search/256/b	HEIGHT		x
+>>>>&1	string		x		%s
+# at offset 2 a New Line character (0xA) should appear
+>>>2	ubyte		!0x0A		\b, 0x%x at offset 2 instead new line
 
@@ -1107,3 +1136,10 @@
 # XV thumbnail indicator (ThMO)
+# URL:		https://en.wikipedia.org/wiki/Xv_(software)
+# Reference:	http://fileformats.archiveteam.org/wiki/XV_thumbnail
+# Update:	Joerg Jenderek
 0	string		P7\ 332		XV thumbnail image data
+#0	string		P7\ 332		XV "thumbnail file" (icon) data
+!:mime	image/x-xv-thumbnail
+# thumbnail .xvpic/foo.bar for graphic foo.bar
+!:ext	p7/gif/tif/xpm/jpg
 
@@ -1213,6 +1249,2 @@
 
-# "thumbnail file" (icon)
-# descended from "xv", but in use by other applications as well (Wolfram Kleff)
-0       string          P7\ 332         XV "thumbnail file" (icon) data
-
 # taken from fkiss: (<yav at mte.biglobe.ne.jp> ?)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.39-images-netpbm.diff.sig
Type: application/octet-stream
Size: 95 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20200803/64f93edd/attachment.obj>


More information about the File mailing list