[File] [PATCH] of Magdir/cad for Intergraph MicroStation update; *.dgn *.cel *.cit *.rgb *.rle

Jörg Jenderek joerg.jen.der.ek at gmx.net
Mon Aug 5 22:24:26 UTC 2019


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,
some weeks ago i handled Microstation V8 CAD variants which are based
on Compound Document format (abbreviated as CDF). I run file
command version 5.37 on non CDF based cad files with name extension
dgn and correlated files. That are libraries with file name extension
 cel and raster images (*.cit *.rle *.rgb). With -k -m Magdir/cad
options i get an output like:

civsur.cel:   Bentley/Intergraph MicroStation DGN cell library
COMP27.RGB:   Microstation
	      Bentley/Intergraph MicroStation
COMP9.rle:    Microstation
	      Bentley/Intergraph MicroStation
FLOORPLA.DGN: Bentley/Intergraph MicroStation DGN vector CAD
	      Microstation
	      Bentley/Intergraph MicroStation
LONGLAT.CIT:  Microstation CITFile
	      Bentley/Intergraph MicroStation CIT raster CAD
samp15.dgn:   Bentley/Intergraph MicroStation DGN vector CAD
	      Microstation
	      Bentley/Intergraph MicroStation
seed2d_b.dgn: Bentley/Intergraph MicroStation DGN vector CAD
	      Microstation
	      Bentley/Intergraph MicroStation
seed3d_b.dgn: Bentley/Intergraph MicroStation DGN vector CAD
WHEEL.DGN:    Bentley/Intergraph MicroStation DGN vector CAD
WRENCH.DGN:   Bentley/Intergraph MicroStation DGN vector CAD
	      Microstation DGNFile
	      Bentley/Intergraph MicroStation

The messages starting with phrase "Bentley/Intergraph" appears 2
times, because the following lines in Magdir/cad appears twice:
 0 belong  0x0809fe02	Bentley/Intergraph MicroStation DGN vector CAD
 0 beshort 0x0809	Bentley/Intergraph MicroStation

The remaining third message starting with phrase Microstation is
triggered by same expression, but only expressed by octal
representation lines like:
 0	string	\010\011\376	Microstation
 >3	string	\002
 >>30	string	x		DGNFile

Furthermore with --extension option only ??? is displayed. And with -i
option only application/octet-stream is displayed.

The raster images are identified by octal expressions like
 >4	string	\030\000\000			CITFile
 >4	string	\030\000\003			CITFile
In principal the same is done by hexadecimal expression like
 >>0x04	beshort	0x1800		CIT raster CAD

As reference i use page about dgn files found on dgnlib site. So i add
comment line like
 # reference:	http://dgnlib.maptools.org/dgn.html
On the the same site i found MicroStation 95 Reference Guide as
ref18.pdf. Both are not full complete, but with that information it is
possible to understand current magic identifications and correct
lines. According to documentation for debugging purpose information
can be shown by lines like
 >0	ubyte&0x3F	x	\b, level %u
 >0	ubyte		&0x80	\b, complex
 >0	ubyte		&0x40	\b, reserved
 >1	ubyte&0x7F	x	\b, type %u
 >2	uleshort	x	\b, words 0x%4.4x to follow

Level seems to be always 8. DGB files always start with element of TCB
type, that is value 9. That is also matched for samples like
seed3d_b.dgn or WHEEL.DGN with complex and reserved bit set. These
samples were described with only one text by magic line
 0 belong 0xc809fe02 Bentley/Intergraph MicroStation DGN vector CAD

CEL libraries always start with element type Group Data Elements, that
is value 5. For such libraries words to follow in element (WTF) have
value 0017h. This was expressed by magic line
 0 belong 0x08051700 Bentley/Intergraph MicroStation DGN cell library
So this magic lines assumes that all cell libraries have a WTF value
17h, but in documentation i see no hint that this should always be
true. So i removed for libries test relying on WTF value.

So i replace all magic lines concerning inspected samples and first
test for level 8 and type 5 or 9 by magic line
 0	beshort&0x3F73	0x0801

By adding the 2 leading words to WTF value you get size of first
element in words and then by multplying by 2 you get size of first
element in bytes. Or use pointer expression to jump to second element
by line
 >(2.s*2)	ulong		x
For debugging purpose the second element type value can be displayed
by line like
 >>&1		ubyte&0x7F	x	\b, 2nd type %u
According to documentation for DGN files this is always 8 for
Digitizer element and for CEL files this is always 5 for library cell
header.
So test for second element type 1 for branch with cell library by
 >>&1		ubyte&0x7F	1
Afterwards test for 1st element with level 8 and type 5 for cell
library by line
 >>>0 beshort 0x0805 Bentley/Intergraph Microstation CAD cell library
Afterwards now show user defined mime typ and file name extension by
lines
 !:mime		application/x-bentley-cel
 !:ext		cel

So branch for DGN files test for second Digitizer element by lines
 >>&1		ubyte&0x7F	8
For DGN files the documentation explicitly mention that first element
has 1536 bytes, that are 3 blocks with 512 bytes. By dividing by 2
this size of element is 768 words long. By subtracting the 2 leading
words you get a WTF value of 766 or expressed in hexadecimal 2FEh. So
here test for valid WTF can be used by lines starting with
 >>>2 uleshort =0x02FE Bentley/Intergraph Microstation CAD drawing

I changed name to phrase with "CAD drawing" instead "DGN vector CAD"
or "DGNFile" according how other call such files by looking at web
site URL http://file-extension.net/seeker/file_extension_dgn .
I also removed the phrase "DGN" because this information is now
visible by user defined mime type and file name extension by
addition lines
 !:mime		application/x-bentley-dgn
 !:ext		dgn

By the help of documentation some more useful additional information
can be displayed. So if the 0x40 bit of a byte is 1 if the file is 3D,
otherwise 0 for two dimension samples. This is expressed by lines
 >>>>1214	ubyte  		&0x40		3D
 >>>>1214	ubyte  		^0x40		2D
This dimensional information is not obvious visible like in samples
seed2d_b.dgn or seed3d_b.dgn.

Furthermore 2 character as abbreviation for sub unit and master unit
can be displayed by lines
 >>>>1120	string		x		\b, units %-.2s
 >>>>1122	string		>\0		%-.2s

In CAD samples like FLOORPLA.DGN made by people using metric systems
you find here often something like m mm.
In samples like seed2d_b.dgn or samp15.dgn made by people using feet
and inch as units you find here often something like FT IN or ' ".

For debugging purpose the words to optional attribute linkage can be
shown by lines
 >>>>30		ubyte		x	\b, attindx \%o
 >>>>31		ubyte		x	\b\%o

These values are different, but apparently only a dozen of combination
seems to appear. This was used as last test for DGN files by 19 lines
likes
 >>30	string	\026\105		DGNFile
 ...
 >>30	string	\376\103		DGNFile
I do not understand why these tests for attindx values are used. For
me this make no sense. So i removed these lines. Instead i used test
for documented second element type 8 mentioned above.
The shown information can be verified by running from dgnlib suite
the dgndump tool on DGN files.

Third branch is for Intergraph raster images (INGR). Information is
found on fileformats.archiveteam.org web site. So i add comment line
 # URL:	http://fileformats.archiveteam.org/wiki/Intergraph_Raster
There a link to specifications of Intergraph Raster File Format (from
archive.org) is also mentioned.

Unfortunately the use of the second element trick is not useful here,
because the documentation says nothing about second element.
According to documentation at the end of first block 3 bytes are
reserved with value always null. For CEL and DGN files there value is
not null. There "conversion" variable of ViewInfo structure is stored.
So catch raster images by new second test line
 >508	ubelong&0xFFffFF00	=0
According to docs raster image always start with byte sequence 08 09.
So test for level 8 and type 9 by third test line like
 >>0	beshort		0x0809

According to documentation first element occupies some blocks a 512
bytes. So size of element in byte is something like 0200h. By dividing
through 2 you get size in words like 0100h. Subtracting 2 for
leading words gives a WTF value like 00FEh. So test for length of 1st
element by line
 >>>2	ubyte		0xfe
Afterwards call new subroutine to describe INGR raster images.
 >>>>0 	use		ingr-image

0	name	ingr-image
At offset 4 the 2 bytes sized variable DataTypeCode is stored. This
indicates format, depth of the pixel data and used compression.
In version 5.37 what was called by "CITFile" and "CIT raster CAD", i
now describe this by lines like
 >4	uleshort	x	Intergraph raster image
 >>4	uleshort	0x0018	\b, CCITT Group 4 1-bit
 !:mime	image/x-intergraph-cit
 !:ext	cit
 >>4	default		x
 >>>4	uleshort	x	\b, Type %u
 !:mime	image/x-intergraph
I changed name. I removed "CIT" phrase because this information is
now shown by --extension and mime typ option. So i look how other
call such images by site like
http://file-extension.net/seeker/file_extension_cit .
And i also look at reference where type 24 is described as "CCITT
Group 4 1-bit". I removed additional magic lines with test for
DataTypeCode 18h, instead i used test for 3 reserved null bytes.
Because then only CIT images are recognized, and for the 33 other
images types you get an unspecific description like MicroStation or
Microstation Bentley/Intergraph for samples like COMP27.RGB and
COMP9.rle. Unfortunately i only get only samples for 2 other image
types. So i insert matching code segments:
 >>4	uleshort	0x0009	\b, Run-Length Encoded 1-bit
 !:mime	image/x-intergraph-rle
 !:ext	rel
 >>4	uleshort	27	\b, Adaptive RLE RGB
 !:mime	image/x-intergraph-rgb
 !:ext	rgb
Afterwards show the ApplicationType, which can have ten possible
values by line:
 >6	uleshort	!0			\b, ApplicationType %u
0 means Generic raster image, 3 means Drawing, Scanning. So in
version 5.37 only CIT examples with these 2 ApplicationType were
recognized by lines
 >4	string	\030\000\000			CITFile
 >4	string	\030\000\003			CITFile
So i removed these additional magic lines, because i now use as
additional line which test for 3 reserved null bytes.

According to documentation now show also image dimension by lines
 >184	ulelong		x			\b, %u x
 >188	ulelong		x			%u
The variable ScanlineOrient indicates the origin and the orientation
of the scan lines. This is now shown by lines
 >194	ubyte		x			\b, orientation
 >194	ubyte		&0x01			right
 >194	ubyte		^0x01			left
 >194	ubyte		&0x02			down
 >194	ubyte		^0x02			top
 >194	ubyte		&0x04			horizontal
 >194	ubyte		^0x04			vertical
The shown information for inspected images can be verified by running
nconvert of xnview suite with -fullinfo option.

After applying the above mentioned modifications by patch
file-5.37-cad-intergraph.diff then duplicate identification vanish and
i get a more precise output like:

civsur.cel:   Bentley/Intergraph Microstation CAD cell library
COMP27.RGB:   Intergraph raster image, Adaptive RLE RGB,
	      640 x 480, orientation left top horizontal
COMP9.rle:    Intergraph raster image, Run-Length Encoded 1-bit,
	      640 x 480, orientation left top horizontal
FLOORPLA.DGN: Bentley/Intergraph Microstation CAD drawing 2D,
	      units m mm
LONGLAT.CIT:  Intergraph raster image, CCITT Group 4 1-bit,
	      1064 x 1201, orientation left top horizontal
samp15.dgn:   Bentley/Intergraph Microstation CAD drawing 2D,
	      units FT IN
seed2d_b.dgn: Bentley/Intergraph Microstation CAD drawing 2D,
	      units ' "
seed3d_b.dgn: Bentley/Intergraph Microstation CAD drawing 3D,
	      units '  "
WHEEL.DGN:    Bentley/Intergraph Microstation CAD drawing 3D,
	      units mu su
WRENCH.DGN:   Bentley/Intergraph Microstation CAD drawing 2D,
	      units in th

I hope my diff file can be applied in future version of
file utility.

With best wishes
Jörg Jenderek
- --
Jörg Jenderek













-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCXUiskQAKCRCv8rHJQhrU
1isuAJ9qeD1o0rElk6xm+yW+ZpfjpXV7jQCdFsbvG+BSMS0uuf7UYgTN3/zuDr4=
=ZFKl
-----END PGP SIGNATURE-----
-------------- next part --------------
--- file-5.37/magic/Magdir/cad.old	2019-04-19 00:42:27 +0000
+++ file-5.37/magic/Magdir/cad	2019-08-05 21:38:08 +0000
@@ -18,29 +18,162 @@
 # 3F86C928&method=display&p_objectid=97F351F5-9C35-4E5E-89C280A93F86C928
 # https://www.bentley.com/products/default.cfm?objectid=A5C2FD43-3AC9-4C71-B682
 # 721C479F&method=display&p_objectid=A5C2FD43-3AC9-4C71-B682C7BE721C479F
-0	string	\010\011\376			Microstation
->3	string	\002
->>30	string	\026\105			DGNFile
->>30	string	\034\105			DGNFile
->>30	string	\073\107			DGNFile
->>30	string	\073\110			DGNFile
->>30	string	\106\107			DGNFile
->>30	string	\110\103			DGNFile
->>30	string	\120\104			DGNFile
->>30	string	\172\104			DGNFile
->>30	string	\172\105			DGNFile
->>30	string	\172\106			DGNFile
->>30	string	\234\106			DGNFile
->>30	string	\273\105			DGNFile
->>30	string	\306\106			DGNFile
->>30	string	\310\104			DGNFile
->>30	string	\341\104			DGNFile
->>30	string	\372\103			DGNFile
->>30	string	\372\104			DGNFile
->>30	string	\372\106			DGNFile
->>30	string	\376\103			DGNFile
->4	string	\030\000\000			CITFile
->4	string	\030\000\003			CITFile
+# 
+# URL:		https://en.wikipedia.org/wiki/MicroStation
+# reference:	http://dgnlib.maptools.org/dgn.html
+#		http://dgnlib.maptools.org/dl/ref18.pdf
+# Update:	Joerg Jenderek
+# Note: verfied by command like `dgndump seed2d_b.dgn`
+# test for level 8 and type 5 or 9
+0	beshort&0x3F73	0x0801
+# level of element like 8
+#>0	ubyte&0x3F	x			\b, level %u
+#>0	ubyte		&0x80			\b, complex
+#>0	ubyte		&0x40			\b, reserved
+# type of element 9~TCB 8~Digitizer setup 5~Group Data Elements
+#>1	ubyte&0x7F	x			\b, type %u
+# words to follow in element: 17H~CEL libray 2FEh~DGN 9FEh,DFEh~CIT
+#>2	uleshort	x			\b, words 0x%4.4x to follow
+# test for 3 reserved 0 bytes in CIT or "conversion" in ViewInfo structure (DGN CEL)
+#>508	ubelong		x			\b, RESERVED %8.8x
+>508	ubelong&0xFFffFF00	=0
+# test for level 8 and type 9 for INGR raster image
+>>0	beshort		0x0809
+# test for length of 1st element is multiple of blocks a 512 bytes
+>>>2	ubyte		0xfe
+>>>>0 	use		ingr-image
+# test for DGN or CEL by jump words (uleshort) forward to next element
+>(2.s*2)	ulong		x
+# 2nd element type: 8~Digitizer~DesiGNfile 1~library cell header other~CIT
+#>>&1		ubyte&0x7F	x		\b, 2nd type %u
+# DGN
+>>&1		ubyte&0x7F	8
+>>>2		uleshort	=0x02FE		Bentley/Intergraph Microstation CAD drawing
+!:mime		application/x-bentley-dgn
+!:ext		dgn
+# The 0x40 bit of this byte is 1 if the file is 3D, otherwise 0
+>>>>1214	ubyte  		&0x40		3D
+>>>>1214	ubyte  		^0x40		2D
+# 2 chars for name of subunits like ft FT in IN mu m mm '\0 '\040
+>>>>1120	string		x		\b, units %-.2s
+# 2 chars for name of master unit like IN in ML SU tn th TH HU mm "\0 "\040 \0\0
+>>>>1122	string		>\0		%-.2s
+#>>>>1120	ubelong		x		\b, units 0x%8.8x
+# element range low,high x y z like xlow=0 08010000h 01080000h
+#>>>>4		ubelong	  	!0		\b, xlow %8.8x
+#>>>>8		ubelong	  	!0		\b, ylow %8.8x
+#>>>>12		ubelong	  	!0		\b, zlow %8.8x
+#>>>>16		ubelong	  	!0		\b, xhigh %8.8x
+#>>>>20		ubelong	  	!0		\b, yhigh %8.8x
+#>>>>24		ubelong	  	!0		\b, zhigh %8.8x
+# graphic group number; all other elements in that group have same non-0 number
+#>>>>28		leshort		x		\b, grphgrp 0x%4.4x
+# words to optional attribute linkage
+#>>>>30		ubyte		x		\b, attindx \%o
+#>>>>31		ubyte		x		\b\%o
+# >>30	string	\026\105			DGNFile
+# >>30	string	\034\105			DGNFile
+# >>30	string	\073\107			DGNFile
+# >>30	string	\073\110			DGNFile
+# >>30	string	\106\107			DGNFile
+# >>30	string	\110\103			DGNFile
+# >>30	string	\120\104			DGNFile
+# >>30	string	\172\104			DGNFile
+# >>30	string	\172\105			DGNFile
+# >>30	string	\172\106			DGNFile
+# >>30	string	\234\106			DGNFile
+# >>30	string	\273\105			DGNFile
+# >>30	string	\306\106			DGNFile
+# >>30	string	\310\104			DGNFile
+# >>30	string	\341\104			DGNFile
+# >>30	string	\372\103			DGNFile
+# >>30	string	\372\104			DGNFile
+# >>30	string	\372\106			DGNFile
+# >>30	string	\376\103			DGNFile
+# elements properties indicator
+#>>>>32		uleshort	!0		\b, properties 0x%4.4x
+# class 0~Primary
+#>>>>>32		uleshort&0x000F	!0		\b, class 0x%4.4x
+# Symbology
+#>>>>>34		uleshort	x		\b, Symbology 0x%4.4x
+# test for 2nd element type 1~library cell header
+>>&1		ubyte&0x7F	1
+# test for 1st element with level 8 and type 5 for cell library
+>>>0		beshort		0x0805		Bentley/Intergraph Microstation CAD cell library
+!:mime		application/x-bentley-cel
+!:ext		cel
+#
+# URL:		http://fileformats.archiveteam.org/wiki/Intergraph_Raster
+# reference:	https://web.archive.org/web/20140903185431/
+#		http://oreilly.com/www/centers/gff/formats/ingr/index.htm
+# note:		verfied by command like `nconvert -fullinfo LONGLAT.CIT`
+# display information for intergraph raster bitmap
+0	name	ingr-image
+# in 5.37 "Microstation CITFile" "Bentley/Intergraph MicroStation CIT raster CAD"
+# DataTypeCode indicates format, depth of the pixel data and used compression 
+>4	uleshort	x			Intergraph raster image
+>>4	uleshort	0x0009			\b, Run-Length Encoded 1-bit
+!:mime	image/x-intergraph-rle
+!:ext	rel
+>>4	uleshort	0x0018			\b, CCITT Group 4 1-bit
+!:mime	image/x-intergraph-cit
+!:ext	cit
+>>4	uleshort	27			\b, Adaptive RLE RGB
+!:mime	image/x-intergraph-rgb
+!:ext	rgb
+>>4	default		x
+>>>4	uleshort	x			\b, Type %u
+!:mime	image/x-intergraph
+# TODO:
+#>4	uleshort	0			\b, no data
+# ...
+#>4	uleshort	0x0045			\b, Continuous Tone CMKY (Uncompressed)
+# ApplicationType: 0~generic raster image 3~drawing, scanning
+# 8~I/IMAGE and MicroStation Imager 9~ModelView
+>6	uleshort	!0			\b, ApplicationType %u
+#>6	uleshort	x			\b, ApplicationType %u
+# XViewOrigin; Raster grid data X origin
+#>8	ulequad		!0			\b, XViewOrigin %llx
+# PixelsPerLine is the number of pixels in a scan line of bitmapp
+>184	ulelong		x			\b, %u x
+# NumberOfLines is height of the raster data in scanlines
+>188	ulelong		x			%u
+# DeviceResolution; resolution of scanning device
+# positive indicates number of micros between lines; negative indicates DPI
+#>192	leshort		x			\b, DeviceResolution %d
+# ScanlineOrient indicates the origin and the orientation of the scan lines
+#>194	ubyte		x			\b, ScanlineOrient %x
+>194	ubyte		x			\b, orientation
+>194	ubyte		&0x01			right
+>194	ubyte		^0x01			left
+>194	ubyte		&0x02			down
+>194	ubyte		^0x02			top
+>194	ubyte		&0x04			horizontal
+>194	ubyte		^0x04			vertical
+# ScannableFlag; Scanline indexing method used
+#>195	ubyte		!0			\b, ScannableFlag 0x%x
+# RotationAngle; Rotation angle of raster data
+#>196	ubequad		!0			\b, RotationAngle 0x%llx
+# SkewAngle; Skew angle of raster data
+#>204	ubequad		!0			\b, SkewAngle %llx
+# DataTypeModifier; Additional raster data format info
+#>212	uleshort	!0			\b, DataTypeModifier 0x%4.4x
+# DesignFile[66]; Name of the design file
+>214	string		>\0			\b, DesignFile %-.66s
+# DatabaseFile[66]; Name of the database file
+>280	string		>\0			\b, DatabaseFile %-.66s
+# ParentGridFile[66]; Name of parent grid file
+>346	string		>\0			\b, ParentGridFile %-.66s
+# FileDescription[80]; Text description of file and contents
+>412	string		>\0			\b, FileDescription %-.80s
+# MinValue
+#>492	ubequad		!0			\b, MinValue 0x%llx
+# MaxValue
+#>500	ubequad		!0			\b, MaxValue 0x%llx
+# Reserved[3]; Unused (always 0)
+#>508	ubelong&0xFFffFF00	x		\b, RESERVED %8.8x
+# GridFileVersion; Grid File Version like 2 3
+#>511	ubyte		x			\b, GridFileVersion %x
 
 # AutoCAD
 # Merge of the different contributions and updates from https://en.wikipedia.org/wiki/Dwg
@@ -140,12 +273,6 @@
 # Phillip Griffith <phillip dot griffith at gmail dot com>
 # AutoCAD magic taken from the Open Design Alliance's OpenDWG specifications.
 #
-0	belong	0x08051700	Bentley/Intergraph MicroStation DGN cell library
-0	belong	0x0809fe02	Bentley/Intergraph MicroStation DGN vector CAD
-0	belong	0xc809fe02	Bentley/Intergraph MicroStation DGN vector CAD
-0	beshort	0x0809		Bentley/Intergraph MicroStation
->0x02	byte	0xfe
->>0x04	beshort	0x1800		CIT raster CAD
 
 # 3DS (3d Studio files)
 0	leshort		0x4d4d
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.37-cad-intergraph.diff.sig
Type: application/octet-stream
Size: 95 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20190806/496d4ce2/attachment.obj>


More information about the File mailing list