[File] [PATCH] of Magdir/cad for Intergraph MicroStation update; *.dgn *.cel *.cit *.rgb *.rle

Christos Zoulas christos at zoulas.com
Sat Aug 10 13:35:12 UTC 2019


On Aug 6, 12:24am, joerg.jen.der.ek at gmx.net (=?UTF-8?Q?J=c3=b6rg_Jenderek?=) wrote:
-- Subject: [File] [PATCH] of Magdir/cad for Intergraph MicroStation update; 

| Hello,
| some weeks ago i handled Microstation V8 CAD variants which are based
| on Compound Document format (abbreviated as CDF). I run file
| command version 5.37 on non CDF based cad files with name extension
| dgn and correlated files. That are libraries with file name extension
|  cel and raster images (*.cit *.rle *.rgb). With -k -m Magdir/cad
| options i get an output like:
| 
| civsur.cel:   Bentley/Intergraph MicroStation DGN cell library
| COMP27.RGB:   Microstation
| 	      Bentley/Intergraph MicroStation
| COMP9.rle:    Microstation
| 	      Bentley/Intergraph MicroStation
| FLOORPLA.DGN: Bentley/Intergraph MicroStation DGN vector CAD
| 	      Microstation
| 	      Bentley/Intergraph MicroStation
| LONGLAT.CIT:  Microstation CITFile
| 	      Bentley/Intergraph MicroStation CIT raster CAD
| samp15.dgn:   Bentley/Intergraph MicroStation DGN vector CAD
| 	      Microstation
| 	      Bentley/Intergraph MicroStation
| seed2d_b.dgn: Bentley/Intergraph MicroStation DGN vector CAD
| 	      Microstation
| 	      Bentley/Intergraph MicroStation
| seed3d_b.dgn: Bentley/Intergraph MicroStation DGN vector CAD
| WHEEL.DGN:    Bentley/Intergraph MicroStation DGN vector CAD
| WRENCH.DGN:   Bentley/Intergraph MicroStation DGN vector CAD
| 	      Microstation DGNFile
| 	      Bentley/Intergraph MicroStation
| 
| The messages starting with phrase "Bentley/Intergraph" appears 2
| times, because the following lines in Magdir/cad appears twice:
|  0 belong  0x0809fe02	Bentley/Intergraph MicroStation DGN vector CAD
|  0 beshort 0x0809	Bentley/Intergraph MicroStation
| 
| The remaining third message starting with phrase Microstation is
| triggered by same expression, but only expressed by octal
| representation lines like:
|  0	string	\010\011\376	Microstation
|  >3	string	\002
|  >>30	string	x		DGNFile
| 
| Furthermore with --extension option only ??? is displayed. And with -i
| option only application/octet-stream is displayed.
| 
| The raster images are identified by octal expressions like
|  >4	string	\030\000\000			CITFile
|  >4	string	\030\000\003			CITFile
| In principal the same is done by hexadecimal expression like
|  >>0x04	beshort	0x1800		CIT raster CAD
| 
| As reference i use page about dgn files found on dgnlib site. So i add
| comment line like
|  # reference:	http://dgnlib.maptools.org/dgn.html
| On the the same site i found MicroStation 95 Reference Guide as
| ref18.pdf. Both are not full complete, but with that information it is
| possible to understand current magic identifications and correct
| lines. According to documentation for debugging purpose information
| can be shown by lines like
|  >0	ubyte&0x3F	x	\b, level %u
|  >0	ubyte		&0x80	\b, complex
|  >0	ubyte		&0x40	\b, reserved
|  >1	ubyte&0x7F	x	\b, type %u
|  >2	uleshort	x	\b, words 0x%4.4x to follow
| 
| Level seems to be always 8. DGB files always start with element of TCB
| type, that is value 9. That is also matched for samples like
| seed3d_b.dgn or WHEEL.DGN with complex and reserved bit set. These
| samples were described with only one text by magic line
|  0 belong 0xc809fe02 Bentley/Intergraph MicroStation DGN vector CAD
| 
| CEL libraries always start with element type Group Data Elements, that
| is value 5. For such libraries words to follow in element (WTF) have
| value 0017h. This was expressed by magic line
|  0 belong 0x08051700 Bentley/Intergraph MicroStation DGN cell library
| So this magic lines assumes that all cell libraries have a WTF value
| 17h, but in documentation i see no hint that this should always be
| true. So i removed for libries test relying on WTF value.
| 
| So i replace all magic lines concerning inspected samples and first
| test for level 8 and type 5 or 9 by magic line
|  0	beshort&0x3F73	0x0801
| 
| By adding the 2 leading words to WTF value you get size of first
| element in words and then by multplying by 2 you get size of first
| element in bytes. Or use pointer expression to jump to second element
| by line
|  >(2.s*2)	ulong		x
| For debugging purpose the second element type value can be displayed
| by line like
|  >>&1		ubyte&0x7F	x	\b, 2nd type %u
| According to documentation for DGN files this is always 8 for
| Digitizer element and for CEL files this is always 5 for library cell
| header.
| So test for second element type 1 for branch with cell library by
|  >>&1		ubyte&0x7F	1
| Afterwards test for 1st element with level 8 and type 5 for cell
| library by line
|  >>>0 beshort 0x0805 Bentley/Intergraph Microstation CAD cell library
| Afterwards now show user defined mime typ and file name extension by
| lines
|  !:mime		application/x-bentley-cel
|  !:ext		cel
| 
| So branch for DGN files test for second Digitizer element by lines
|  >>&1		ubyte&0x7F	8
| For DGN files the documentation explicitly mention that first element
| has 1536 bytes, that are 3 blocks with 512 bytes. By dividing by 2
| this size of element is 768 words long. By subtracting the 2 leading
| words you get a WTF value of 766 or expressed in hexadecimal 2FEh. So
| here test for valid WTF can be used by lines starting with
|  >>>2 uleshort =3D0x02FE Bentley/Intergraph Microstation CAD drawing
| 
| I changed name to phrase with "CAD drawing" instead "DGN vector CAD"
| or "DGNFile" according how other call such files by looking at web
| site URL http://file-extension.net/seeker/file_extension_dgn .
| I also removed the phrase "DGN" because this information is now
| visible by user defined mime type and file name extension by
| addition lines
|  !:mime		application/x-bentley-dgn
|  !:ext		dgn
| 
| By the help of documentation some more useful additional information
| can be displayed. So if the 0x40 bit of a byte is 1 if the file is 3D,
| otherwise 0 for two dimension samples. This is expressed by lines
|  >>>>1214	ubyte  		&0x40		3D
|  >>>>1214	ubyte  		^0x40		2D
| This dimensional information is not obvious visible like in samples
| seed2d_b.dgn or seed3d_b.dgn.
| 
| Furthermore 2 character as abbreviation for sub unit and master unit
| can be displayed by lines
|  >>>>1120	string		x		\b, units %-.2s
|  >>>>1122	string		>\0		%-.2s
| 
| In CAD samples like FLOORPLA.DGN made by people using metric systems
| you find here often something like m mm.
| In samples like seed2d_b.dgn or samp15.dgn made by people using feet
| and inch as units you find here often something like FT IN or ' ".
| 
| For debugging purpose the words to optional attribute linkage can be
| shown by lines
|  >>>>30		ubyte		x	\b, attindx \%o
|  >>>>31		ubyte		x	\b\%o
| 
| These values are different, but apparently only a dozen of combination
| seems to appear. This was used as last test for DGN files by 19 lines
| likes
|  >>30	string	\026\105		DGNFile
|  ...
|  >>30	string	\376\103		DGNFile
| I do not understand why these tests for attindx values are used. For
| me this make no sense. So i removed these lines. Instead i used test
| for documented second element type 8 mentioned above.
| The shown information can be verified by running from dgnlib suite
| the dgndump tool on DGN files.
| 
| Third branch is for Intergraph raster images (INGR). Information is
| found on fileformats.archiveteam.org web site. So i add comment line
|  # URL:	http://fileformats.archiveteam.org/wiki/Intergraph_Raster
| There a link to specifications of Intergraph Raster File Format (from
| archive.org) is also mentioned.
| 
| Unfortunately the use of the second element trick is not useful here,
| because the documentation says nothing about second element.
| According to documentation at the end of first block 3 bytes are
| reserved with value always null. For CEL and DGN files there value is
| not null. There "conversion" variable of ViewInfo structure is stored.
| So catch raster images by new second test line
|  >508	ubelong&0xFFffFF00	=3D0
| According to docs raster image always start with byte sequence 08 09.
| So test for level 8 and type 9 by third test line like
|  >>0	beshort		0x0809
| 
| According to documentation first element occupies some blocks a 512
| bytes. So size of element in byte is something like 0200h. By dividing
| through 2 you get size in words like 0100h. Subtracting 2 for
| leading words gives a WTF value like 00FEh. So test for length of 1st
| element by line
|  >>>2	ubyte		0xfe
| Afterwards call new subroutine to describe INGR raster images.
|  >>>>0 	use		ingr-image
| 
| 0	name	ingr-image
| At offset 4 the 2 bytes sized variable DataTypeCode is stored. This
| indicates format, depth of the pixel data and used compression.
| In version 5.37 what was called by "CITFile" and "CIT raster CAD", i
| now describe this by lines like
|  >4	uleshort	x	Intergraph raster image
|  >>4	uleshort	0x0018	\b, CCITT Group 4 1-bit
|  !:mime	image/x-intergraph-cit
|  !:ext	cit
|  >>4	default		x
|  >>>4	uleshort	x	\b, Type %u
|  !:mime	image/x-intergraph
| I changed name. I removed "CIT" phrase because this information is
| now shown by --extension and mime typ option. So i look how other
| call such images by site like
| http://file-extension.net/seeker/file_extension_cit .
| And i also look at reference where type 24 is described as "CCITT
| Group 4 1-bit". I removed additional magic lines with test for
| DataTypeCode 18h, instead i used test for 3 reserved null bytes.
| Because then only CIT images are recognized, and for the 33 other
| images types you get an unspecific description like MicroStation or
| Microstation Bentley/Intergraph for samples like COMP27.RGB and
| COMP9.rle. Unfortunately i only get only samples for 2 other image
| types. So i insert matching code segments:
|  >>4	uleshort	0x0009	\b, Run-Length Encoded 1-bit
|  !:mime	image/x-intergraph-rle
|  !:ext	rel
|  >>4	uleshort	27	\b, Adaptive RLE RGB
|  !:mime	image/x-intergraph-rgb
|  !:ext	rgb
| Afterwards show the ApplicationType, which can have ten possible
| values by line:
|  >6	uleshort	!0			\b, ApplicationType %u
| 0 means Generic raster image, 3 means Drawing, Scanning. So in
| version 5.37 only CIT examples with these 2 ApplicationType were
| recognized by lines
|  >4	string	\030\000\000			CITFile
|  >4	string	\030\000\003			CITFile
| So i removed these additional magic lines, because i now use as
| additional line which test for 3 reserved null bytes.
| 
| According to documentation now show also image dimension by lines
|  >184	ulelong		x			\b, %u x
|  >188	ulelong		x			%u
| The variable ScanlineOrient indicates the origin and the orientation
| of the scan lines. This is now shown by lines
|  >194	ubyte		x			\b, orientation
|  >194	ubyte		&0x01			right
|  >194	ubyte		^0x01			left
|  >194	ubyte		&0x02			down
|  >194	ubyte		^0x02			top
|  >194	ubyte		&0x04			horizontal
|  >194	ubyte		^0x04			vertical
| The shown information for inspected images can be verified by running
| nconvert of xnview suite with -fullinfo option.
| 
| After applying the above mentioned modifications by patch
| file-5.37-cad-intergraph.diff then duplicate identification vanish and
| i get a more precise output like:
| 
| civsur.cel:   Bentley/Intergraph Microstation CAD cell library
| COMP27.RGB:   Intergraph raster image, Adaptive RLE RGB,
| 	      640 x 480, orientation left top horizontal
| COMP9.rle:    Intergraph raster image, Run-Length Encoded 1-bit,
| 	      640 x 480, orientation left top horizontal
| FLOORPLA.DGN: Bentley/Intergraph Microstation CAD drawing 2D,
| 	      units m mm
| LONGLAT.CIT:  Intergraph raster image, CCITT Group 4 1-bit,
| 	      1064 x 1201, orientation left top horizontal
| samp15.dgn:   Bentley/Intergraph Microstation CAD drawing 2D,
| 	      units FT IN
| seed2d_b.dgn: Bentley/Intergraph Microstation CAD drawing 2D,
| 	      units ' "
| seed3d_b.dgn: Bentley/Intergraph Microstation CAD drawing 3D,
| 	      units '  "
| WHEEL.DGN:    Bentley/Intergraph Microstation CAD drawing 3D,
| 	      units mu su
| WRENCH.DGN:   Bentley/Intergraph Microstation CAD drawing 2D,
| 	      units in th
| 
| I hope my diff file can be applied in future version of
| file utility.
| 

Committed, thanks!


christos


More information about the File mailing list