[File] [PATCH] Magdir/images FITS image; more extensions+mime type
Christos Zoulas
christos at zoulas.com
Fri Jan 5 16:18:50 UTC 2024
Committed, thanks!
christos
> On Dec 24, 2023, at 10:29 PM, Jörg Jenderek (GMX) <joerg.jen.der.ek at gmx.net> wrote:
>
> Hello,
>
> some days ago i must handles some files with FTS suffix. Some samples
> are astronomic graphic images.
>
> When running file command version 5.45 on such graphic images and
> related files i get an output like:
>
> DDTSUVDATA.fits: FITS image data , 32-bit
> , floating point, single precision
> M57.FIT: FITS image data
> M57.PGM: Netpbm image data
> , size = 192 x 165, rawbits, greymap
> MOON.FTS: FITS image data
> arange.fits: FITS image data, 32-bit
> , two's complement binary integer
> blank.fits: FITS image data
> example.fit: FITS image data, 8-bit
> , character or unsigned binary integer
> group.fits: FITS image data, 32-bit
> , floating point, single precision
> header_newlines.fits: FITS image data, 64-bit
> , floating point, double precision
> ngc1316r-d.fz: FITS image data, 16-bit
> , two's complement binary integer
> ngc1316r-gzip.fz: FITS image data, 16-bit
> , two's complement binary integer
> ngc1316r-gzip2.fz: FITS image data, 16-bit
> , two's complement binary integer
> ngc1316r-hcomp.fz: FITS image data, 16-bit
> , two's complement binary integer
> ngc1316r-rice.fz: FITS image data, 16-bit
> , two's complement binary integer
> ngc1316r.fit: FITS image data, 16-bit
> , two's complement binary integer
> o4sp040b0_raw-p.fz: FITS image data, 16-bit
> , two's complement binary integer
> x-fmt-383-signature-id-57.fits: FITS image data, 8-bit
> , character or unsigned binary integer
> x-fmt-383-signature.fits: FITS image data, 8-bit
> , character or unsigned binary integer
>
> With option --extension only 2 suffix fits/fts are shown and with -i
> option not false image/fits is shown.
>
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). Most samples are
> "recognized" and described with low priority as "Flexible Image
> Transport System bitmap (gen)" with mime type image/fits by
> bitmap-fts.trid.xml. But here 4 suffix (.FITS/FIT/FTS/FZ) are listed.
> The samples with FZ suffix are also described with higher priority as
> "Flexible Image Transport System bitmap (compressed)" via
> bitmap-fz.trid.xml. Here only 1 suffix is listed. The sample (like
> M57.FIT) and DROID samples (like x-fmt-383-signature.fits
> x-fmt-383-signature-id-57.fits) are described as "Unknown!" (See appended
> trid-v-fits.txt.gz).
>
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/). This
> identifies most examples as "Flexible Image Transport System" described
> by PUID x-fmt/383. Here now 2 mime types (application/fits image/fits)
> are listed. Here only fits suffix is considered as valid. A few samples
> (like M57.FIT MOON.FTS) are not recognized (See appended
> droid-fits.csv.gz)
>
> Luckily with information given by the other tools i also found a
> page about Flexible Image Transport System on Wikipedia and file formats
> archive team web site. There also links for samples, suited software and
> references are listed. That information is expressed by comment lines
> inside Magdir/images like:
>
> # URL: http://fileformats.archiveteam.org/
> # wiki/Flexible_Image_Transport_System
> # https://en.wikipedia.org/wiki/FITS
> # Ref.: https://mark0.net/download/triddefs_xml.7z
> # defs/b/bitmap-fts.trid.xml
> # URL: https://heasarc.nasa.gov/fitsio/fpack/
> # Ref.: https://mark0.net/download/triddefs_xml.7z
> # defs/b/bitmap-fz.trid.xml
>
> The description starts inside Magdir/images by lines like:
> 0 string SIMPLE\ \ = FITS image data
> !:mime image/fits
> !:ext fits/fts
>
> To improve recognition i summarize the file format specification.
> Especially at the beginning the data are organized in blocks (or called
> card image) with 80 bytes. A card consist of 3 structures (keyword,
> assignment-operator, value with optional comment). The keyword is a 1-
> to 8-character, left-justified ASCII string. The assignment indicator
> starts with equal sign (=). This indicator always occupies columns nine
> and ten in the card image. The value is an ASCII representation of the
> numerical or string data associated with the keyword. A comment is
> separated from the value by a slash (/) or a space and a slash (/); the
> latter is recommended. A boolean value always occupies column 30.
> Columns that do not contain data are filled with spaces. Integer and
> floating-point values are located in columns 11 through 30 and are
> right-justified with spaces, if necessary. if a keyword contains fewer
> than eight characters, it is padded with spaces. There are five keywords
> that are required in every FITS file: SIMPLE, BITPIX, NAXIS, NAXISn, and
> END. (EXTEND is also a required keyword if extensions are present in the
> file.).So file command at the moment only check keyword and assignment
> of first card. TrID and DROID also checks keyword and assignment of
> second card. DROID also checks keyword, assignment and value of third card.
>
> On the one hand find command could not be a validator for FITS samples.
> That is done for example by tool fitsverify (see appended
> DDTSUVDATA-fitsverify.txt.gz x-fmt-383-fitsverify.txt.gz
> M57-fitsverify.txt.gz MOON-fitsverify.txt.gz). On the other hand in my
> opinion it should not so stiff as the other mentioned tools, because
> some other mentioned software accepts "strange" samples (like M57.FIT
> MOON.FTS) and some crashed or freeze without telling reason. So i
> decided to accept for file command most "strange" and telling
> information about "strangeness". So i only skip DROID samples (like
> x-fmt-383-signature-id-57.fits). This is used by DROID tool to recognize
> FITS samples. So these are not real files but contain just some leading
> characteristic bytes. Or more concrete the first 2 cards and
> keyword/assignment part of third card. The difference to real examples
> is that most values are filled with dummy bytes (0x00 or 0xAB) whereas
> there space characters (0x20) are used for padding. So to skip DROID
> samples the magic lines now starts like:
> 0 string SIMPLE\ \ =
> >89 ubeshort =0x2020 FITS image
>
> The used bits length are shown by lines afterwards like:
> >109 string 8 \b, 8-bit, character or unsigned binary integer
> >108 string 16 \b, 16-bit, two's complement binary integer
> >107 string \ 32 \b, 32-bit, two's complement binary integer
> >107 string -32 \b, 32-bit, floating point, single precision
> >107 string -64 \b, 64-bit, floating point, double precision
>
> What is wrong here? An entry for second 64-bit variant (two's complement
> binary integer found for example in blank.fits) is missing. In well
> formed samples the bit numbers are stored at defined positions. So in
> few samples (like MOON.FTS) the eight digit is some bytes more left found.
> So in few samples (like M57.FIT) the eight digit is some bytes more
> right found. Furthermore in few samples (like M57.FIT) card 2 (BITPIX)
> and card 3 (NAXIS) are swapped. To describe also these mentioned samples
> the bit part now becomes like:
>
> >>80 search/81/b BITPIX\040\040=
> >>>&28 string 8 \b, 8-bit, character or unsigned binary integer
> >>>>0 string x (too right positioned)
> >>>&11 string 8 \b, 8-bit, character or unsigned binary integer
> >>>>0 string x (too left positioned)
> >>>&20 string 8 \b, 8-bit, character or unsigned binary integer
> >>>&19 string 16 \b, 16-bit, two's complement binary integer
> >>>&18 string \04032 \b, 32-bit, two's complement binary integer
> >>>&18 string -32 \b, 32-bit, floating point, single precision
> >>>&18 string -64 \b, 64-bit, floating point, double precision
> >>>&18 string \04064 \b, 64-bit, two's complement binary integer
>
> Additional information for "strange" samples (like M57.FIT) is shown at
> the end by new lines like:
> >>80 string !BITPIX\040\040= \b, at 80
> >>>80 string x "%-0.9s"
> >>160 string !NAXIS\040\040\040= \b, at 160
> >>>160 string x "%-0.9s"
>
> Samples can be compressed (with types like NOCOMPRESS GZIP_1 GZIP_2
> HCOMPRESS_1 PLIO_1 RICE_1). To avoid complications (some software can
> not handle compressed samples) here other file name suffix FZ is used.
> With the help of TrID definition and fpack user guide i see that this
> information is stored in card with ZCMPTYPE keyword. So the extension
> part is now done by lines like:
> >>240 search/0x4790/b ZCMPTYPE= data, compression type
> #>>>&0 string x COMPRESSION=%0.13s
> >>>&0 regex [A-Z_1-2]{4,11} %s
> >>240 default x data
> !:ext fits/fit/fts
>
> The dimensions are stored in card with keyword NAXIS normally third. A
> single digit 2 implies conventional bitmap in most cases. The digit 3
> implies data cubes of three dimensions (animated bitmap or similar).
> Such samples can often be displayed/converted by graphic tools (like
> XnView ImageMagick GIMP).
>
> I verified information by XnView command line tool by line like:
> nconvert -in fits -fullinfo M57.FIT MOON.FTS
> Here some samples (like example.fit M57.FIT MOON.FTS ngc1316r.fit) are
> recognized (See appended nconvert-fits.txt.gz)
> I also tried ImageMagick version 7.1.1. Here some others samples
> (like arange.fits blank.fits example.fit header_newlines.fits MOON.FTS
> ngc1316r.fit) are recognized (See appended identify-fits.txt.gz) by
> command line like:
> identify MOON.FTS
> I also tried NetPBM tools. This can be done by command line like:
> fitstopnm M57.FIT | file
>
> So such samples get mime type image/fits. The samples with other
> dimensions (like 0 5 6) can normally not be displayed by graphic tools.
> So such samples get mime type application/fits. This is now done by
> lines like:
>
> >>80 search/81/b NAXIS\040\040\040= \b,
> #>>>>&0 string x NAXIS=%-0.31s
> >>>&0 search/31/b \0400\040 0 axes
> !:mime application/fits
> >>>&-1 search/31/b \0401\040 1 axis
> !:mime application/fits
> #!:mime image/fits
> >>>&0 search/31/b \0402\040 2 axes
> !:mime image/fits
> >>>&0 search/31/b \0403\040 3 axes
> !:mime image/fits
> >>>&0 default x
> >>>>&0 regex/31/s =[0-9]{1,3} %s axis
> !:mime application/fits
>
> Then of course you want to get the dimensions of data as shown for other
> graphics (like M57.PGM). For real images you often get known dimensions
> (like 1200x800 example.fit) whereas for application samples you often
> get "strange" dimensions (like 0x3 DDTSUVDATA.fits 0x5 group.fits 8x300
> ngc1316r-gzip.fz). This information is stored in cards with keywords
> NAXIS1 and NAXIS2. So this is now shown by lines like:
>
> >>240 search/29400/bs NAXIS1\040\040= \b,
> >>>&9 regex =[0-9]{1,31} %s
> >>>320 search/29120/bs NAXIS2\040\040= x
> >>>>&9 regex =[0-9]{1,31} %s
>
> After applying the above mentioned modifications by patch
> file-5.45-images-fits.diff then all my inspected astronomic graphic
> images still described but now i get for all my inspected samples
> bit depth information. Also dimension and compression informations are
> now shown. This then looks like:
>
> DDTSUVDATA.fits: FITS image data, 32-bit
> , floating point, single precision
> , 6 axis, 0 x 3
> M57.FIT: FITS image data, 8-bit
> , character or unsigned binary integer
> (too right positioned)
> , 2 axes, 192 x 165
> , at 80 "NAXIS =", at 160 "BITPIX ="
> M57.PGM: Netpbm image data
> , size = 192 x 165, rawbits, greymap
> MOON.FTS: FITS image data, 8-bit
> , character or unsigned binary integer
> (too left positioned)
> , 2 axes, 192 x 165
> arange.fits: FITS image data, 32-bit
> , two's complement binary integer
> , 3 axes, 11 x 10
> blank.fits: FITS image data, 64-bit
> , two's complement binary integer
> , 2 axes, 1 x 1
> example.fit: FITS image data, 8-bit
> , character or unsigned binary integer
> , 3 axes, 1200 x 800
> group.fits: FITS image data, 32-bit
> , floating point, single precision
> , 5 axis, 0 x 5
> header_newlines.fits: FITS image data, 64-bit
> , floating point, double precision
> , 2 axes, 1 x 1
> ngc1316r-d.fz: FITS image data
> , compression type NOCOMPRESS, 16-bit
> , two's complement binary integer
> , 0 axes, 16 x 300
> ngc1316r-gzip.fz: FITS image data
> , compression type GZIP_1, 16-bit
> , two's complement binary integer
> , 0 axes, 8 x 300
> ngc1316r-gzip2.fz: FITS image data
> , compression type GZIP_2, 16-bit
> , two's complement binary integer
> , 0 axes, 8 x 300
> ngc1316r-hcomp.fz: FITS image data
> , compression type HCOMPRESS_1 , 16-bit
> , two's complement binary integer
> , 0 axes, 8 x 19
> ngc1316r-rice.fz: FITS image data
> , compression type RICE_1, 16-bit
> , two's complement binary integer
> , 0 axes, 8 x 300
> ngc1316r.fit: FITS image data, 16-bit
> , two's complement binary integer
> , 2 axes, 440 x 300
> o4sp040b0_raw-p.fz: FITS image data
> , compression type PLIO_1, 16-bit
> , two's complement binary integer
> , 0 axes, 8 x 44
> x-fmt-383-signature-id-57.fits: data
> x-fmt-383-signature.fits: ISO-8859 text, with no line terminators
>
> I hope my diff file can be applied in future version of file
> utility.
>
> With best wishes,
> Jörg Jenderek
> --
> Jörg Jenderek
> <Nachrichtenteil als Anhang.DEFANGED-28><nconvert-fits.txt.gz><identify-fits.txt.gz><trid-v-fits.txt.gz><droid-fits.csv.gz><DDTSUVDATA-fitsverify.txt.gz><x-fmt-383-fitsverify.txt.gz><M57-fitsverify.txt.gz><MOON-fitsverify.txt.gz><file-5_45-images-fits_diff_sig.DEFANGED-29><file-5_45-images-fits_diff.DEFANGED-30>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>
More information about the File
mailing list