[File] [PATCH] Magdir/images for Common Data Format *.cdf + version

Fri Mar 18 21:23:16 UTC 2022

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

some days ago i handled some NetCDF Data samples. According to some
documentation beside NC file name extension also CDF was used.
Unfortunately these extensions is also used by other software tools.
When running file command version 5.41 on such other CDF examples
i get a nearly correct output like:

a1_k0_mpa_20050804_v02.cdf:
	Common Data Format (Version 2.6 or 2.7) data
c1_waveform_wbd_200202080940_v01.cdf:
	Common Data Format (Version 3 or later) data
c1_waveform_wbd_200202080940_v01_subset.cdf:
	Common Data Format (Version 3 or later) data
cl_sp_edi_00000000_v01.cdf:
	Common Data Format (Version 2.6 or 2.7) data
ge_k0_cpi_19921231_v02.cdf:
	Common Data Format (Version 2.5 or earlier) data
im_k0_euv_20011231_v01.cdf:
	Common Data Format (Version 2.6 or 2.7) data
im_k0_rpi_20051218_v01.cdf:
	Common Data Format (Version 2.6 or 2.7) data
tha_l2_fgm_20070729_v01.cdf:
	Common Data Format (Version 3 or later) data
tha_l2_scm_20160831_v01.cdf:
	Common Data Format (Version 3 or later) data
ulysses.cdf:
	Common Data Format (Version 2.5 or earlier) data

With -i option for all examples mime type application/x-cdf is shown,
with --extension option only 3 byte sequence ??? is displayed.

Luckily i found a page about Common Data Format on file formats
archive team web site. There also a link to specifications of CDF
internal formats as PDF Document with name cdf35ifd.pdf is mentioned.
That information is expressed by comment lines inside Magdir/images
like:
# URL:	http://fileformats.archiveteam.org/wiki/Common_Data_Format
# Ref.:	https://cdaweb.gsfc.nasa.gov/pub/software/cdf/doc/cdf350/
#	cdf35ifd.pdf

The description happens inside Magdir/images by lines like:
0 belong 0xCDF30001 Common Data Format (Version 3 or later) data
!:mime  application/x-cdf
0 belong 0xCDF26002 Common Data Format (Version 2.6 or 2.7) data
!:mime  application/x-cdf
0 belong 0x0000FFFF Common Data Format (Version 2.5 or earlier) data
!:mime  application/x-cdf

With the help of the specification i was able to refine the magic
lines. According to that documentation for version 2.6 and earlier
version 2.5 the formats are nearly the same. The difference is the
starting magic and the stored version information. In Version 3 the
file structure is similar to other earlier versions. The only
differences are the fields for record sizes and offsets. They are
8-bytes, instead of 4-bytes. So version share common data structure,
but at different offsets. So show information by calling sub routines
at different offsets. So for the above third variant (Version 2.5 or
earlier) call sub routine cdf-v2 and this now becomes like:
 0	belong	0x0000FFFF
 >0		use		cdf-v2
 0	name			cdf-v2
 >0	use		cdf-common
 >20	use		cdf-version-string
 #>4	belong	=0x0000FFFF	\b, regular
 >4	belong	=0xCCCC0001	\b, compressed
 >8	ubelong	!0x130	 	\b, CDR size %#x
 #>12	belong	!1	 	\b, RecordType %d
 >16	ubelong	!0x138	 	\b, GDRoffset %#x
 >28	use			cdf-enc-flags
 >56	use			cdf-text

The describing start text (3 words) with mime type and file name
extension is done by first common sub routine which looks like:
 0 name				cdf-common
 >0	belong	x		Common Data Format
 !:mime  application/x-cdf
 !:ext	cdf

Instead of showing estimated version information based on 4 byte
start magic i now show the exact version info in format with
parenthesis like in older/current file command version, because this
information is stored at offset 20 here or at offset 28 for version
3. This now looks like:
 0 name				cdf-version-string
 >0	ubelong	x	 	(Version %u
 #>4	ubelong	x	 	\b, RELEASE %u
 >4	ubelong	x	 	\b.%u
 >24	ubelong	x	 	\b.%u) data
 #>24	ubelong	x	 	\b, INCREMENT %u
So for version 2.5.8a is CDF main version 2, release number 5, and
increment number 8. The sub-increment is a, but that information is
not stored inside CDF it self.

The encoding and flags information appear at offset 28 here or at
offset 36 for version 3. This information is shown by sub routine lik
e:
 >0	ubelong	>1	 	\b, encoding %u
 #>4	belong	x	 	\b, FLAGS %#x
 >4	belong	&0x00000001	\b, row-majority
 >4	belong	&0x00000002	\b, single
 >4	belong	&0x00000004	\b,
 >>4	belong	&0x00000008	md5
 #>>4	belong	&0x00000010	other
 >>4	belong	&0x00000004	checksum
 #>8	belong	!0	 	\b, rfuA %#x
 #>12	belong	!0	 	\b, rfuB %#x
 ##>16	ubelong	x	 	INCREMENT %u
 #>20	belong	!0xFFffFFff	\b, rfuD %#x
 #>24	belong	!0xFFffFFff	\b, rfuE %#x
Value 1 for encoding field means network encoding. 6 is used for
Intel Windows and highest values is 9 for Power PC. Afterwards comes
the flag field. This described if data has a checksum and what type
of checksum for CDF is used. Afterwards comes 4 reserved fields (rfuA
rfuB increment rfD rfuD). These are reserved for future use and value
is zero for first two and minus one for the two last.

After the structure with encoding comes field for copyright text at
offset 56 here or at 64 for version 3. This consist of maximal 1945
characters for versions prior 2.5 and maximal 256 in later versions.
The field consist of lines separated by a newline character (0x0A).
In my inspected examples i see typical 6 lines. So the first 3 lines
with text like "Common Data Format (CDF)", "(C) Copyright 1990-1995"
and "National Space Science Data Center" are shown by lines like:

 0 name				cdf-text
 #>0	ubyte		!0x0A		%c
 >1	string		x		\b, %-.255s
 >>&0	ubyte		=0x0A
 >>>&0	string		x		%s
 >>>>&0	ubyte		=0x0A
 >>>>>&0 string		x		%s

After applying the above mentioned modifications by patch
file-5.41-images-x-cdf.diff then all samples are described as before
with exact version and more interesting details like:

a1_k0_mpa_20050804_v02.cdf:
	Common Data Format (Version 2.4.7) data
	, GDRoffset 0x7d1
	, single
	, Common Data Format (CDF)
	(C) Copyright 1990-2004 NASA/GSFC
c1_waveform_wbd_200202080940_v01.cdf:
	Common Data Format (Version 3.3.0) data
	, row-majority , single, md5 checksum
	, Common Data Format (CDF)
	(C) Copyright 1990-2009 NASA/GSFC
c1_waveform_wbd_200202080940_v01_subset.cdf:
	Common Data Format (Version 3.3.1) data
	, row-majority, single, md5 checksum
	, Common Data Format (CDF)
	(C) Copyright 1990-2010 NASA/GSFC
cl_sp_edi_00000000_v01.cdf:
	Common Data Format (Version 2.7.2) data
	, row-majority, single
	, Common Data Format (CDF)
	(C) Copyright 1990-2011 NASA/GSFC
ge_k0_cpi_19921231_v02.cdf:
	Common Data Format (Version 2.4.6) data
	, CDR size 0x7c9, GDRoffset 0x7d1
	, single
	, NSSDC Common Data Format (C)
	Copyright 1990-1994 NASA/GSFC
im_k0_euv_20011231_v01.cdf:
	Common Data Format (Version 2.6.1) data
	, single
	, NSSDC Common Data Format (CDF)
	(C) Copyright 1990-1996 NASA/GSFC
im_k0_rpi_20051218_v01.cdf:
	Common Data Format (Version 2.7.0) data
	, row-majority, single
	, NSSDC Common Data Format (CDF)
	(C) Copyright 1990-1997 NASA/GSFC
tha_l2_fgm_20070729_v01.cdf:
	Common Data Format (Version 3.2.0) data
	, row-majority, single
	, Common Data Format (CDF)
	(C) Copyright 1990-2006 NASA/GSFC
tha_l2_scm_20160831_v01.cdf:
	Common Data Format (Version 3.6.1) data
	, row-majority, single
	, Common Data Format (CDF)
	(C) Copyright 1990-2015 NASA/GSFC
ulysses.cdf:
	Common Data Format (Version 2.5.22) data
	, row-majority, single
	, NSSDC Common Data Format (CDF)
	(C) Copyright 1990-1995 NASA/GSFC

I hope my diff file can be applied in future version of file
utility.

With best wishes
Jörg Jenderek
- --
Jörg Jenderek



-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYjT4RAAKCRCv8rHJQhrU
1oUXAKCSwjYa5U0ew9l/f5uia3pRsul0YwCfVQvHAzYUxFcUfxz8me6CCV1db0w=
=P8Dl
-----END PGP SIGNATURE-----
-------------- next part --------------
-- 
File mailing list
File at astron.com
https://mailman.astron.com/mailman/listinfo/file

-------------- next part --------------

--- file-5.41/magic/Magdir/images.old	2021-10-18 16:20:03.000000000 +0200
+++ file-5.41/magic/Magdir/images	2022-03-18 12:10:40.095164900 +0100
@@ -1667,12 +1667,113 @@
 # From: Michael Liu
 # https://en.wikipedia.org/wiki/Common_Data_Format
-0	belong	0xCDF30001	Common Data Format (Version 3 or later) data
-!:mime  application/x-cdf
+# URL: 		http://fileformats.archiveteam.org/wiki/Common_Data_Format
+# Reference:	https://cdaweb.gsfc.nasa.gov/pub/software/cdf/doc/cdf350/cdf35ifd.pdf
+#		Common Data Format (Version 3 or later)
+0	belong	0xCDF30001
+>0	use		cdf-common
+# show exact version string with numbers like: 3.y.z
+>28	use		cdf-version
+# magic number 2; 0x0000FFFF~regular CDF file 0xCCCC0001~compressed CDF file
+#>4	belong	=0x0000FFFF	\b, regular
+>4	belong	=0xCCCC0001	\b, compressed
+# CDF Descriptor Record (CDR) size like: 138h
+>8	bequad	!0x0138	 	\b, CDR~size %#llx
+# RecordType; -1~Unused 1~CDR ... 13~Compressed Variable Values Record
+#>16	belong	!1	 	\b, RecordType %d
+# The file offset of the Global Descriptor Record (GDR) like: 140h
+>20	bequad	!0x140	 	\b, GDRoffset %#llx
+#>28	ubelong	x	 	\b, MAJOR VERSION %u
+#>32	ubelong	x	 	\b, RELEASE %u
+# show encoding and flags 
+>36	use			cdf-enc-flags
+# show copyright text
+>64	use			cdf-text
 
-0	belong	0xCDF26002	Common Data Format (Version 2.6 or 2.7) data
-!:mime  application/x-cdf
+#		Common Data Format (Version 2.6 or 2.7)
+0	belong	0xCDF26002
+>0		use		cdf-v2
 
-0	belong	0x0000FFFF	Common Data Format (Version 2.5 or earlier) data
+#		Common Data Format (Version 2.5 or earlier)
+0	belong	0x0000FFFF
+>0		use		cdf-v2
+0	name			cdf-v2
+>0	use		cdf-common
+>20	use		cdf-version
+# magic number 2; 0x0000FFFF~regular CDF file 0xCCCC0001~compressed CDF file
+#>4	belong	=0x0000FFFF	\b, regular
+>4	belong	=0xCCCC0001	\b, compressed
+# CDF Descriptor Record (CDR) size like: 130h 138h 7c9h (ge_k0_cpi_19921231_v02.cdf)
+>8	ubelong	!0x130	 	\b, CDR size %#x
+# RecordType; -1~Unused 1~CDR ... 13~Compressed Variable Values Record
+#>12	belong	!1	 	\b, RecordType %d
+# The file offset of the Global Descriptor Record (GDR) like: 138h 7D1h (ge_k0_cpi_19921231_v02.cdf)
+>16	ubelong	!0x138	 	\b, GDRoffset %#x
+# show encoding and flags 
+>28	use			cdf-enc-flags
+# show copyright text
+>56	use			cdf-text
+################################################################################
+0 name				cdf-common
+# to get look like in older file versions use 3 phrases
+>0	belong	x		Common Data Format
 !:mime  application/x-cdf
+!:ext	cdf
+#	display exact version string info common in all Common Data Format but at different offsets 
+0 name				cdf-version
+# 2.5.8a is CDF version 2, release 5, and increment 8, sub-increment a (not stored)
+# show exact version number x.y.z where x is main version number like: 2 3
+# left parenthesis and text before major version number like in older file versions
+>0	ubelong	x	 	(Version %u
+# release; like: 1 3 4 5 6 7
+#>4	ubelong	x	 	\b, RELEASE %u
+>4	ubelong	x	 	\b.%u
+# to get look like in older file versions use 4 phrases: point increment parenthesis "data"
+# increment is like: 0 1 2 6 7 22
+>24	ubelong	x	 	\b.%u) data
+#>24	ubelong	x	 	\b, INCREMENT %u
+#	display encoding and flags value common in all Common Data Format but at different offsets
+0 name				cdf-enc-flags
+# Encoding; 1~network encoding ... 6~Intel Windows ... 9~Power PC
+>0	ubelong	>1	 	\b, encoding %u
+# flags; 1~row-majority 2~single-file 4~with checksum 8~MD5 checksum 16~not MD5 chcksum 32...~unused like: 2 3 15
+#>4	belong	x	 	\b, FLAGS %#x
+>4	belong	&0x00000001	\b, row-majority
+>4	belong	&0x00000002	\b, single
+>4	belong	&0x00000004	\b,
+>>4	belong	&0x00000008	md5
+#>>4	belong	&0x00000010	other
+>>4	belong	&0x00000004	checksum
+# rfuA; reserved for future use; set to 0
+#>8	belong	!0	 	\b, rfuA %#x
+# rfuB; reserved for future use; set to 0
+#>12	belong	!0	 	\b, rfuB %#x
+# version increment like: 1 2 6 7 22
+##>16	ubelong	x	 	INCREMENT %u
+# rfuD; reserved for future use; set to -1
+#>20	belong	!0xFFffFFff	\b, rfuD %#x
+# rfuE; reserved for future use; set to -1
+#>24	belong	!0xFFffFFff	\b, rfuE %#x
+# display maximal 1945 characters of copyright text in versions prior 2.5 and 256 in later versions
+# consist of lines separated by a newline character (0x0A)
+0 name				cdf-text
+# 1 character probably always newline
+#>0	ubyte		!0x0A		%c
+# 1st copyright line like: Common Data Format (CDF)
+>1	string		x		\b, %-.255s
+# newline implies more copy right lines like: (C) Copyright 1990-1995
+>>&0	ubyte		=0x0A
+>>>&0	string		x		%s
+# # newline implies 3rd copyright line like: National Space Science Data Center
+# >>>>&0	ubyte		=0x0A
+# >>>>>&0	string		x		%s
+# # newline implies 4th copyright line like: NASA/Goddard Space Flight Center
+# >>>>>>&0	ubyte	=0x0A
+# >>>>>>>&0	string	x		%s
+# # newline implies 5th copyright line like: Greenbelt, Maryland 20771 USA
+# >>>>>>>>&0	ubyte	=0x0A
+# >>>>>>>>>&0	string	x		%s
+# # newline implies 6th copyright line like: (DECnet   -- NCF::CDFSUPPORT)
+# >>>>>>>>>>&0	ubyte	=0x0A
+# >>>>>>>>>>>&0	string	x		%s
 
 # Hierarchical Data Format, used to facilitate scientific data exchange
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.41-images-x-cdf.diff.sig
Type: application/octet-stream
Size: 1982 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220318/ea67e3d9/attachment.obj>