[File] [PATCH] Magdir/images FITS image; more extensions+mime type

Jörg Jenderek (GMX) joerg.jen.der.ek at gmx.net
Mon Dec 25 03:29:55 UTC 2023


Hello,

some days ago i must handles some files with FTS suffix. Some samples
are astronomic graphic images.

When running file command version 5.45 on such graphic images and
related files i get an output like:

DDTSUVDATA.fits:                FITS image data	, 32-bit
				, floating point, single precision
M57.FIT:                        FITS image data
M57.PGM:                        Netpbm image data
				, size = 192 x 165, rawbits, greymap
MOON.FTS:                       FITS image data
arange.fits:                    FITS image data, 32-bit
				, two's complement binary integer
blank.fits:                     FITS image data
example.fit:                    FITS image data, 8-bit
				, character or unsigned binary integer
group.fits:                     FITS image data, 32-bit
				, floating point, single precision
header_newlines.fits:           FITS image data, 64-bit
				, floating point, double precision
ngc1316r-d.fz:                  FITS image data, 16-bit
				, two's complement binary integer
ngc1316r-gzip.fz:               FITS image data, 16-bit
				, two's complement binary integer
ngc1316r-gzip2.fz:              FITS image data, 16-bit
				, two's complement binary integer
ngc1316r-hcomp.fz:              FITS image data, 16-bit
				, two's complement binary integer
ngc1316r-rice.fz:               FITS image data, 16-bit
				, two's complement binary integer
ngc1316r.fit:                   FITS image data, 16-bit
				, two's complement binary integer
o4sp040b0_raw-p.fz:             FITS image data, 16-bit
				, two's complement binary integer
x-fmt-383-signature-id-57.fits: FITS image data, 8-bit
				, character or unsigned binary integer
x-fmt-383-signature.fits:       FITS image data, 8-bit
				, character or unsigned binary integer

With option --extension only 2 suffix fits/fts are shown and with -i
option not false image/fits is shown.

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). Most samples are
"recognized" and described with low priority as "Flexible Image
Transport System bitmap (gen)" with mime type image/fits by
bitmap-fts.trid.xml. But here 4 suffix (.FITS/FIT/FTS/FZ) are listed.
The samples with FZ suffix are also described with higher priority as
"Flexible Image Transport System bitmap (compressed)" via
bitmap-fz.trid.xml. Here only 1 suffix is listed. The sample (like
M57.FIT)  and DROID samples (like x-fmt-383-signature.fits
x-fmt-383-signature-id-57.fits) are described as "Unknown!" (See appended
trid-v-fits.txt.gz).

For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/). This
identifies most examples as "Flexible Image Transport System" described
by PUID x-fmt/383. Here now 2 mime types (application/fits image/fits)
are listed. Here only fits suffix is considered as valid. A few samples
(like M57.FIT MOON.FTS) are not recognized (See appended
droid-fits.csv.gz)

Luckily with information given by the other tools i also found a
page about Flexible Image Transport System on Wikipedia and file formats
archive team web site. There also links for samples, suited software and
references are listed. That information is expressed by comment lines
inside Magdir/images like:

# URL:	http://fileformats.archiveteam.org/
#	wiki/Flexible_Image_Transport_System
#	https://en.wikipedia.org/wiki/FITS
# Ref.:	https://mark0.net/download/triddefs_xml.7z
#	defs/b/bitmap-fts.trid.xml
# URL:	https://heasarc.nasa.gov/fitsio/fpack/
# Ref.:	https://mark0.net/download/triddefs_xml.7z
#	defs/b/bitmap-fz.trid.xml

The description starts inside Magdir/images by lines like:
0	string	SIMPLE\ \ =	FITS image data
!:mime	image/fits
!:ext	fits/fts

To improve recognition i summarize the file format specification.
Especially at the beginning the data are organized in blocks (or called
card image) with 80 bytes. A card consist of 3 structures (keyword,
assignment-operator, value with optional comment). The keyword is a 1-
to 8-character, left-justified ASCII string. The assignment indicator
starts with equal sign (=). This indicator always occupies columns nine
and ten in the card image. The value is an ASCII representation of the
numerical or string data associated with the keyword. A comment is
separated from the value by a slash (/) or a space and a slash (/); the
latter is recommended. A boolean value always occupies column 30.
Columns that do not contain data are filled with spaces. Integer and
floating-point values are located in columns 11 through 30 and are
right-justified with spaces, if necessary. if a keyword contains fewer
than eight characters, it is padded with spaces. There are five keywords
that are required in every FITS file: SIMPLE, BITPIX, NAXIS, NAXISn, and
END. (EXTEND is also a required keyword if extensions are present in the
file.).So file command at the moment only check keyword and assignment
of first card. TrID and DROID also checks keyword and assignment of
second card. DROID also checks keyword, assignment and value of third card.

On the one hand find command could not be a validator for FITS samples.
That is done for example by tool fitsverify (see appended
DDTSUVDATA-fitsverify.txt.gz x-fmt-383-fitsverify.txt.gz
M57-fitsverify.txt.gz MOON-fitsverify.txt.gz). On the other hand in my
opinion it should not so stiff as the other mentioned tools, because
some other mentioned software accepts "strange" samples (like M57.FIT
MOON.FTS) and some crashed or freeze without telling reason. So i
decided to accept for file command most "strange" and telling
information about "strangeness". So i only skip DROID samples (like
x-fmt-383-signature-id-57.fits). This is used by DROID tool to recognize
FITS samples. So these are not real files but contain just some leading
characteristic bytes. Or more concrete the first 2 cards and
keyword/assignment part of third card. The difference to real examples
is that most values are filled with dummy bytes (0x00 or 0xAB) whereas
there space characters (0x20) are used for padding. So to skip DROID
samples the magic lines now starts like:
  0	string	SIMPLE\ \ =
  >89	ubeshort	=0x2020	FITS image

The used bits length are shown by lines afterwards like:
  >109	string	8	\b, 8-bit, character or unsigned binary integer
  >108	string	16	\b, 16-bit, two's complement binary integer
  >107	string	\ 32	\b, 32-bit, two's complement binary integer
  >107	string	-32	\b, 32-bit, floating point, single precision
  >107	string	-64	\b, 64-bit, floating point, double precision

What is wrong here? An entry for second 64-bit variant (two's complement
binary integer found for example in blank.fits) is missing. In well
formed samples the bit numbers are stored at defined positions. So in
few samples (like MOON.FTS) the eight digit is some bytes more left found.
So in few samples (like M57.FIT) the eight digit is some bytes more
right found. Furthermore in few samples (like M57.FIT) card 2 (BITPIX)
and card 3 (NAXIS) are swapped. To describe also these mentioned samples
the bit part now becomes like:

  >>80	search/81/b	BITPIX\040\040=
  >>>&28	string	8	\b, 8-bit, character or unsigned binary integer
  >>>>0	string	x		(too right positioned)
  >>>&11	string	8	\b, 8-bit, character or unsigned binary integer
  >>>>0	string	x		(too left positioned)
  >>>&20	string	8	\b, 8-bit, character or unsigned binary integer
  >>>&19	string	16	\b, 16-bit, two's complement binary integer
  >>>&18	string	\04032	\b, 32-bit, two's complement binary integer
  >>>&18	string	-32	\b, 32-bit, floating point, single precision
  >>>&18	string	-64	\b, 64-bit, floating point, double precision
  >>>&18	string	\04064	\b, 64-bit, two's complement binary integer

Additional information for "strange" samples (like M57.FIT) is shown at
the end by new lines like:
  >>80	string	!BITPIX\040\040=	\b, at 80
  >>>80	string	x			"%-0.9s"
  >>160	string	!NAXIS\040\040\040=	\b, at 160
  >>>160	string	x			"%-0.9s"

Samples can be compressed (with types like NOCOMPRESS GZIP_1 GZIP_2
HCOMPRESS_1 PLIO_1 RICE_1). To avoid complications (some software can
not handle compressed samples) here other file name suffix FZ is used.
With the help of TrID definition and fpack user guide i see that this
information is stored in card with ZCMPTYPE keyword. So the extension
part is now done by lines like:
  >>240	search/0x4790/b	ZCMPTYPE=	data, compression type
  #>>>&0	string		x		COMPRESSION=%0.13s
  >>>&0	regex		[A-Z_1-2]{4,11}	%s
  >>240	default		x		data
!:ext	fits/fit/fts

The dimensions are stored in card with keyword NAXIS normally third. A
single digit 2 implies conventional bitmap in most cases. The digit 3
implies data cubes of three dimensions (animated bitmap or similar).
Such samples can often be displayed/converted by graphic tools (like
XnView ImageMagick GIMP).

I verified information by XnView command line tool by line like:
	nconvert -in fits -fullinfo M57.FIT MOON.FTS
Here some samples (like example.fit M57.FIT MOON.FTS ngc1316r.fit) are
recognized (See appended nconvert-fits.txt.gz)
I also tried ImageMagick version 7.1.1. Here some others samples
(like arange.fits blank.fits example.fit header_newlines.fits MOON.FTS
ngc1316r.fit) are recognized (See appended identify-fits.txt.gz) by
command line like:
	identify MOON.FTS
I also tried NetPBM tools. This can be done by command line like:
	fitstopnm M57.FIT | file

So such samples get mime type image/fits. The samples with other
dimensions (like 0 5 6) can normally not be displayed by graphic tools.
So such samples get mime type application/fits. This is now done by
lines like:

  >>80	search/81/b	NAXIS\040\040\040=		\b,
  #>>>>&0 string		x		NAXIS=%-0.31s
  >>>&0	search/31/b	\0400\040	0 axes
  !:mime	application/fits
  >>>&-1	search/31/b	\0401\040	1 axis
  !:mime	application/fits
  #!:mime	image/fits
  >>>&0	search/31/b	\0402\040	2 axes
  !:mime	image/fits
  >>>&0	search/31/b	\0403\040	3 axes
  !:mime	image/fits
  >>>&0	default		x
  >>>>&0	regex/31/s	=[0-9]{1,3} 	%s axis
  !:mime	application/fits

Then of course you want to get the dimensions of data as shown for other
graphics (like M57.PGM). For real images you often get known dimensions
(like 1200x800 example.fit) whereas for application samples you often
get "strange" dimensions (like 0x3 DDTSUVDATA.fits 0x5 group.fits 8x300
ngc1316r-gzip.fz). This information is stored in cards with keywords
NAXIS1 and NAXIS2. So this is now shown by lines like:

  >>240	search/29400/bs	NAXIS1\040\040=		\b,
  >>>&9	regex	=[0-9]{1,31} 	%s
  >>>320	search/29120/bs	NAXIS2\040\040=		x
  >>>>&9	regex	=[0-9]{1,31} 	%s

After applying the above mentioned modifications by patch
file-5.45-images-fits.diff then all my inspected astronomic graphic
images still described but now i get for all my inspected samples
bit depth information. Also dimension and compression informations are
now shown. This then looks like:

DDTSUVDATA.fits:                FITS image data, 32-bit
				, floating point, single precision
				, 6 axis, 0 x 3
M57.FIT:                        FITS image data, 8-bit
				, character or unsigned binary integer
				(too right positioned)
				, 2 axes, 192 x 165
				, at 80 "NAXIS   =", at 160 "BITPIX  ="
M57.PGM:                        Netpbm image data
				, size = 192 x 165, rawbits, greymap
MOON.FTS:                       FITS image data, 8-bit
				, character or unsigned binary integer
				(too left positioned)
				, 2 axes, 192 x 165
arange.fits:                    FITS image data, 32-bit
				, two's complement binary integer
				, 3 axes, 11 x 10
blank.fits:                     FITS image data, 64-bit
				, two's complement binary integer
				, 2 axes, 1 x 1
example.fit:                    FITS image data, 8-bit
				, character or unsigned binary integer
				, 3 axes, 1200 x 800
group.fits:                     FITS image data, 32-bit
				, floating point, single precision
				, 5 axis, 0 x 5
header_newlines.fits:           FITS image data, 64-bit
				, floating point, double precision
				, 2 axes, 1 x 1
ngc1316r-d.fz:                  FITS image data
				, compression type NOCOMPRESS, 16-bit
				, two's complement binary integer
				, 0 axes, 16 x 300
ngc1316r-gzip.fz:               FITS image data
				, compression type GZIP_1, 16-bit
				, two's complement binary integer
				, 0 axes, 8 x 300
ngc1316r-gzip2.fz:              FITS image data
				, compression type GZIP_2, 16-bit
				, two's complement binary integer
				, 0 axes, 8 x 300
ngc1316r-hcomp.fz:              FITS image data
				, compression type HCOMPRESS_1 , 16-bit
				, two's complement binary integer
				, 0 axes, 8 x 19
ngc1316r-rice.fz:               FITS image data
				, compression type RICE_1, 16-bit
				, two's complement binary integer
				, 0 axes, 8 x 300
ngc1316r.fit:                   FITS image data, 16-bit
				, two's complement binary integer
				, 2 axes, 440 x 300
o4sp040b0_raw-p.fz:             FITS image data
				, compression type PLIO_1, 16-bit
				, two's complement binary integer
				, 0 axes, 8 x 44
x-fmt-383-signature-id-57.fits: data
x-fmt-383-signature.fits:       ISO-8859 text, with no line terminators

I hope my diff file can be applied in future version of file
utility.

With best wishes,
Jörg Jenderek
--
Jörg Jenderek
-------------- next part --------------
-- 
File mailing list
File at astron.com
https://mailman.astron.com/mailman/listinfo/file

-------------- next part --------------
A non-text attachment was scrubbed...
Name: nconvert-fits.txt.gz
Type: application/x-gzip
Size: 861 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20231225/2340323c/attachment-0008.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: identify-fits.txt.gz
Type: application/x-gzip
Size: 673 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20231225/2340323c/attachment-0009.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-fits.txt.gz
Type: application/x-gzip
Size: 806 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20231225/2340323c/attachment-0010.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: droid-fits.csv.gz
Type: application/x-gzip
Size: 801 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20231225/2340323c/attachment-0011.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DDTSUVDATA-fitsverify.txt.gz
Type: application/x-gzip
Size: 705 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20231225/2340323c/attachment-0012.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: x-fmt-383-fitsverify.txt.gz
Type: application/x-gzip
Size: 255 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20231225/2340323c/attachment-0013.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: M57-fitsverify.txt.gz
Type: application/x-gzip
Size: 252 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20231225/2340323c/attachment-0014.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MOON-fitsverify.txt.gz
Type: application/x-gzip
Size: 462 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20231225/2340323c/attachment-0015.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.45-images-fits.diff.sig
Type: application/octet-stream
Size: 2627 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20231225/2340323c/attachment-0001.obj>
-------------- next part --------------
--- file-5.45/magic/Magdir/images.old	2023-07-27 20:04:45.000000000 +0200
+++ file-5.45/magic/Magdir/images	2023-12-25 04:24:39.827902200 +0100
@@ -1329,17 +1329,95 @@
 0	string		PCD_OPA		Kodak Photo CD overview pack file
 
 # FITS format.  Jeff Uphoff <juphoff at tarsier.cv.nrao.edu>
+# Update:	Joerg Jenderek
+# URL:		http://fileformats.archiveteam.org/wiki/Flexible_Image_Transport_System
+#		https://en.wikipedia.org/wiki/FITS
+# Reference:	https://mark0.net/download/triddefs_xml.7z/defs/b/bitmap-fts.trid.xml
+# Note:		called "Flexible Image Transport System bitmap" by TrID, GIMP and DROID via PUID x-fmt/383
+#		"FITS document" with expanded acronym "Flexible Image Transport System" by shared MIME-info database from freedesktop.org
+#		verified as "Flexible Image Transport System" by XnView `nconvert -fullinfo M57.FIT MOON.FTS` ,
+#		as "FTS (Flexible Image Transport System)" by ImageMagick command `identify MOON.FTS` ,
+#		by NetPBM `fitstopnm M57.FIT | file` ,
+#		falsified by `fitsverify M57.FIT MOON.FTS`
 # FITS is the Flexible Image Transport System, the de facto standard for
 # data and image transfer, storage, etc., for the astronomical community.
 # (FITS floating point formats are big-endian.)
-0	string	SIMPLE\ \ =	FITS image data
+# keyword is a 1- to 8-character, left-justified ASCII string; columns that do not contain data are filled with spaces
+# The assignment indicator (=) always occupies columns nine and ten in the card
+0	string	SIMPLE\ \ =
+# skip DROID x-fmt-383-signature-id-57.fits by check for left padding spaces of 2nd card value
+>89	ubeshort	=0x2020	FITS image
+# URL:		https://heasarc.gsfc.nasa.gov/fitsio/fpack/
+# Reference:	https://mark0.net/download/triddefs_xml.7z/defs/b/bitmap-fz.trid.xml
+#		https://heasarc.gsfc.nasa.gov/FTP/software/fitsio/c/docs/fpackguide.pdf
+# Note:		called "Flexible Image Transport System bitmap (compressed)" by TrID
+>>240	search/0x4790/b	ZCMPTYPE=	data, compression type
+# fz suffix for compressed fits
+!:ext	fz
+# Flexible Image Transport System compression value (followed by optional FITS comment) like: NOCOMPRESS GZIP_1 GZIP_2 HCOMPRESS_1 PLIO_1 RICE_1
+#>>>&0	string		x		COMPRESSION=%0.13s
+>>>&0	regex		[A-Z_1-2]{4,11}	%s
+# not compressed Flexible Image Transport System with other filename suffix
+>>240	default		x		data
+!:ext	fits/fit/fts
+# five keywords that are required in every FITS file: SIMPLE, BITPIX, NAXIS, NAXISn, and END. EXTEND is also a required keyword if extensions are present in the file
+# required keyword in standard on 3rd card contains integer number of bits used to represent each data value but in 2nd card for M57.FIT
+>>80	search/81/b	BITPIX\040\040=
+#>>>&11	string	x		BIT=%-0.18s
+# this is the number of bits per pixel for image data
+>>>&20	string	8		\b, 8-bit, character or unsigned binary integer
+# few samples with more right positioned values like: M57.FIT
+# GRR: avoid warning: Magdir\images, 1380: Warning: description `, 8-bit, character or unsigned binary integer (too right positioned)' truncated
+>>>&28	string	8		\b, 8-bit, character or unsigned binary integer
+>>>>0	string	x		(too right positioned)
+# few samples not right justified positioned like: MOON.FTS
+>>>&11	string	8		\b, 8-bit, character or unsigned binary integer
+>>>>0	string	x		(too left positioned)
+# according to DROID but no examples found
+#>>>&19	string	08		\b, 8-bit, character or unsigned binary integer
+#>>>&19	string	+8		\b, 8-bit, character or unsigned binary integer
+>>>&19	string	16		\b, 16-bit, two's complement binary integer
+>>>&18	string	\04032		\b, 32-bit, two's complement binary integer
+>>>&18	string	-32		\b, 32-bit, floating point, single precision
+>>>&18	string	-64		\b, 64-bit, floating point, double precision
+# second 64-bit variant like: blank.fits
+>>>&18	string	\04064		\b, 64-bit, two's complement binary integer
+# in standard number of dimensions by keyword NAXIS on 3rd card image but in few cases on 2nd card like: M57.FIT
+>>80	search/81/b	NAXIS\040\040\040=		\b,
+# before optional comment 31 ASCII charactes left padded with spaces for integer (0-999) of data axis like: 0 (extension no data) 1 (spectrum) 2 (conventional bitmap) 3 (animated bitmap example.fit test.fits) 6 (DDTSUVDATA.fits)
+#>>>>&0	string		x		NAXIS=%-0.31s
+# single digit 0 implies no data or similar
+>>>&0	search/31/b	\0400\040	0 axes
+!:mime	application/fits
+# single digit 1 implies one-dimensional entity such as a spectrum or a time series (no example found)
+>>>&-1	search/31/b	\0401\040	1 axis
+!:mime	application/fits
+#!:mime	image/fits
+# single digit 2 implies conventional bitmap
+>>>&0	search/31/b	\0402\040	2 axes
 !:mime	image/fits
-!:ext	fits/fts
->109	string	8		\b, 8-bit, character or unsigned binary integer
->108	string	16		\b, 16-bit, two's complement binary integer
->107	string	\ 32		\b, 32-bit, two's complement binary integer
->107	string	-32		\b, 32-bit, floating point, single precision
->107	string	-64		\b, 64-bit, floating point, double precision
+# single digit 3 implies data cubes of three dimensions (animated bitmap or similar)
+>>>&0	search/31/b	\0403\040	3 axes
+!:mime	image/fits
+# data cubes more dimensions like: 5 (group.fits) 6 (DDTSUVDATA.fits)
+>>>&0	default		x
+>>>>&0	regex/31/s	=[0-9]{1,3} 	%s axis
+!:mime	application/fits
+# often NAXIS1 as 4th card but sometimes at higher offset like: 29400 (IUElwp25637mxlo.fits) 20400 (NICMOSn4hk12010_mos.fits)
+>>240	search/29400/bs	NAXIS1\040\040=		\b,
+# before optional comment 31 ASCII charactes left padded with spaces for first axis like: 192 512 1024 1200 2000 2064 3600 ...
+>>>&9	regex	=[0-9]{1,31} 	%s
+# often NAXIS2 as 5th card but sometimes not existent or at higher offset like: 29120 (IUElwp25637mxlo.fits) 20480 (NICMOSn4hk12010_mos.fits)
+>>>320	search/29120/bs	NAXIS2\040\040=		x
+# before optional comment 31 ASCII charactes left padded with spaces for second axis like: 2 4 165 512 800 1024 3600 ...
+>>>>&9	regex	=[0-9]{1,31} 	%s
+# not standard cards
+>>80	string	!BITPIX\040\040= \b, at 80
+# in M57.FIT like: "NAXIS   ="
+>>>80	string	x		"%-0.9s"
+>>160	string	!NAXIS\040\040\040= \b, at 160
+# in M57.FIT like: "BITPIX  ="
+>>>160	string	x		"%-0.9s"
 
 # other images
 0	string	This\ is\ a\ BitMap\ file	Lisp Machine bit-array-file


More information about the File mailing list