[File] [PATCH] of Magdir/cad for Intergraph MicroStation update; *.dgn *.cel *.cit *.rgb *.rle
Christos Zoulas
christos at zoulas.com
Sat Aug 10 13:35:12 UTC 2019
On Aug 6, 12:24am, joerg.jen.der.ek at gmx.net (=?UTF-8?Q?J=c3=b6rg_Jenderek?=) wrote:
-- Subject: [File] [PATCH] of Magdir/cad for Intergraph MicroStation update;
| Hello,
| some weeks ago i handled Microstation V8 CAD variants which are based
| on Compound Document format (abbreviated as CDF). I run file
| command version 5.37 on non CDF based cad files with name extension
| dgn and correlated files. That are libraries with file name extension
| cel and raster images (*.cit *.rle *.rgb). With -k -m Magdir/cad
| options i get an output like:
|
| civsur.cel: Bentley/Intergraph MicroStation DGN cell library
| COMP27.RGB: Microstation
| Bentley/Intergraph MicroStation
| COMP9.rle: Microstation
| Bentley/Intergraph MicroStation
| FLOORPLA.DGN: Bentley/Intergraph MicroStation DGN vector CAD
| Microstation
| Bentley/Intergraph MicroStation
| LONGLAT.CIT: Microstation CITFile
| Bentley/Intergraph MicroStation CIT raster CAD
| samp15.dgn: Bentley/Intergraph MicroStation DGN vector CAD
| Microstation
| Bentley/Intergraph MicroStation
| seed2d_b.dgn: Bentley/Intergraph MicroStation DGN vector CAD
| Microstation
| Bentley/Intergraph MicroStation
| seed3d_b.dgn: Bentley/Intergraph MicroStation DGN vector CAD
| WHEEL.DGN: Bentley/Intergraph MicroStation DGN vector CAD
| WRENCH.DGN: Bentley/Intergraph MicroStation DGN vector CAD
| Microstation DGNFile
| Bentley/Intergraph MicroStation
|
| The messages starting with phrase "Bentley/Intergraph" appears 2
| times, because the following lines in Magdir/cad appears twice:
| 0 belong 0x0809fe02 Bentley/Intergraph MicroStation DGN vector CAD
| 0 beshort 0x0809 Bentley/Intergraph MicroStation
|
| The remaining third message starting with phrase Microstation is
| triggered by same expression, but only expressed by octal
| representation lines like:
| 0 string \010\011\376 Microstation
| >3 string \002
| >>30 string x DGNFile
|
| Furthermore with --extension option only ??? is displayed. And with -i
| option only application/octet-stream is displayed.
|
| The raster images are identified by octal expressions like
| >4 string \030\000\000 CITFile
| >4 string \030\000\003 CITFile
| In principal the same is done by hexadecimal expression like
| >>0x04 beshort 0x1800 CIT raster CAD
|
| As reference i use page about dgn files found on dgnlib site. So i add
| comment line like
| # reference: http://dgnlib.maptools.org/dgn.html
| On the the same site i found MicroStation 95 Reference Guide as
| ref18.pdf. Both are not full complete, but with that information it is
| possible to understand current magic identifications and correct
| lines. According to documentation for debugging purpose information
| can be shown by lines like
| >0 ubyte&0x3F x \b, level %u
| >0 ubyte &0x80 \b, complex
| >0 ubyte &0x40 \b, reserved
| >1 ubyte&0x7F x \b, type %u
| >2 uleshort x \b, words 0x%4.4x to follow
|
| Level seems to be always 8. DGB files always start with element of TCB
| type, that is value 9. That is also matched for samples like
| seed3d_b.dgn or WHEEL.DGN with complex and reserved bit set. These
| samples were described with only one text by magic line
| 0 belong 0xc809fe02 Bentley/Intergraph MicroStation DGN vector CAD
|
| CEL libraries always start with element type Group Data Elements, that
| is value 5. For such libraries words to follow in element (WTF) have
| value 0017h. This was expressed by magic line
| 0 belong 0x08051700 Bentley/Intergraph MicroStation DGN cell library
| So this magic lines assumes that all cell libraries have a WTF value
| 17h, but in documentation i see no hint that this should always be
| true. So i removed for libries test relying on WTF value.
|
| So i replace all magic lines concerning inspected samples and first
| test for level 8 and type 5 or 9 by magic line
| 0 beshort&0x3F73 0x0801
|
| By adding the 2 leading words to WTF value you get size of first
| element in words and then by multplying by 2 you get size of first
| element in bytes. Or use pointer expression to jump to second element
| by line
| >(2.s*2) ulong x
| For debugging purpose the second element type value can be displayed
| by line like
| >>&1 ubyte&0x7F x \b, 2nd type %u
| According to documentation for DGN files this is always 8 for
| Digitizer element and for CEL files this is always 5 for library cell
| header.
| So test for second element type 1 for branch with cell library by
| >>&1 ubyte&0x7F 1
| Afterwards test for 1st element with level 8 and type 5 for cell
| library by line
| >>>0 beshort 0x0805 Bentley/Intergraph Microstation CAD cell library
| Afterwards now show user defined mime typ and file name extension by
| lines
| !:mime application/x-bentley-cel
| !:ext cel
|
| So branch for DGN files test for second Digitizer element by lines
| >>&1 ubyte&0x7F 8
| For DGN files the documentation explicitly mention that first element
| has 1536 bytes, that are 3 blocks with 512 bytes. By dividing by 2
| this size of element is 768 words long. By subtracting the 2 leading
| words you get a WTF value of 766 or expressed in hexadecimal 2FEh. So
| here test for valid WTF can be used by lines starting with
| >>>2 uleshort =3D0x02FE Bentley/Intergraph Microstation CAD drawing
|
| I changed name to phrase with "CAD drawing" instead "DGN vector CAD"
| or "DGNFile" according how other call such files by looking at web
| site URL http://file-extension.net/seeker/file_extension_dgn .
| I also removed the phrase "DGN" because this information is now
| visible by user defined mime type and file name extension by
| addition lines
| !:mime application/x-bentley-dgn
| !:ext dgn
|
| By the help of documentation some more useful additional information
| can be displayed. So if the 0x40 bit of a byte is 1 if the file is 3D,
| otherwise 0 for two dimension samples. This is expressed by lines
| >>>>1214 ubyte &0x40 3D
| >>>>1214 ubyte ^0x40 2D
| This dimensional information is not obvious visible like in samples
| seed2d_b.dgn or seed3d_b.dgn.
|
| Furthermore 2 character as abbreviation for sub unit and master unit
| can be displayed by lines
| >>>>1120 string x \b, units %-.2s
| >>>>1122 string >\0 %-.2s
|
| In CAD samples like FLOORPLA.DGN made by people using metric systems
| you find here often something like m mm.
| In samples like seed2d_b.dgn or samp15.dgn made by people using feet
| and inch as units you find here often something like FT IN or ' ".
|
| For debugging purpose the words to optional attribute linkage can be
| shown by lines
| >>>>30 ubyte x \b, attindx \%o
| >>>>31 ubyte x \b\%o
|
| These values are different, but apparently only a dozen of combination
| seems to appear. This was used as last test for DGN files by 19 lines
| likes
| >>30 string \026\105 DGNFile
| ...
| >>30 string \376\103 DGNFile
| I do not understand why these tests for attindx values are used. For
| me this make no sense. So i removed these lines. Instead i used test
| for documented second element type 8 mentioned above.
| The shown information can be verified by running from dgnlib suite
| the dgndump tool on DGN files.
|
| Third branch is for Intergraph raster images (INGR). Information is
| found on fileformats.archiveteam.org web site. So i add comment line
| # URL: http://fileformats.archiveteam.org/wiki/Intergraph_Raster
| There a link to specifications of Intergraph Raster File Format (from
| archive.org) is also mentioned.
|
| Unfortunately the use of the second element trick is not useful here,
| because the documentation says nothing about second element.
| According to documentation at the end of first block 3 bytes are
| reserved with value always null. For CEL and DGN files there value is
| not null. There "conversion" variable of ViewInfo structure is stored.
| So catch raster images by new second test line
| >508 ubelong&0xFFffFF00 =3D0
| According to docs raster image always start with byte sequence 08 09.
| So test for level 8 and type 9 by third test line like
| >>0 beshort 0x0809
|
| According to documentation first element occupies some blocks a 512
| bytes. So size of element in byte is something like 0200h. By dividing
| through 2 you get size in words like 0100h. Subtracting 2 for
| leading words gives a WTF value like 00FEh. So test for length of 1st
| element by line
| >>>2 ubyte 0xfe
| Afterwards call new subroutine to describe INGR raster images.
| >>>>0 use ingr-image
|
| 0 name ingr-image
| At offset 4 the 2 bytes sized variable DataTypeCode is stored. This
| indicates format, depth of the pixel data and used compression.
| In version 5.37 what was called by "CITFile" and "CIT raster CAD", i
| now describe this by lines like
| >4 uleshort x Intergraph raster image
| >>4 uleshort 0x0018 \b, CCITT Group 4 1-bit
| !:mime image/x-intergraph-cit
| !:ext cit
| >>4 default x
| >>>4 uleshort x \b, Type %u
| !:mime image/x-intergraph
| I changed name. I removed "CIT" phrase because this information is
| now shown by --extension and mime typ option. So i look how other
| call such images by site like
| http://file-extension.net/seeker/file_extension_cit .
| And i also look at reference where type 24 is described as "CCITT
| Group 4 1-bit". I removed additional magic lines with test for
| DataTypeCode 18h, instead i used test for 3 reserved null bytes.
| Because then only CIT images are recognized, and for the 33 other
| images types you get an unspecific description like MicroStation or
| Microstation Bentley/Intergraph for samples like COMP27.RGB and
| COMP9.rle. Unfortunately i only get only samples for 2 other image
| types. So i insert matching code segments:
| >>4 uleshort 0x0009 \b, Run-Length Encoded 1-bit
| !:mime image/x-intergraph-rle
| !:ext rel
| >>4 uleshort 27 \b, Adaptive RLE RGB
| !:mime image/x-intergraph-rgb
| !:ext rgb
| Afterwards show the ApplicationType, which can have ten possible
| values by line:
| >6 uleshort !0 \b, ApplicationType %u
| 0 means Generic raster image, 3 means Drawing, Scanning. So in
| version 5.37 only CIT examples with these 2 ApplicationType were
| recognized by lines
| >4 string \030\000\000 CITFile
| >4 string \030\000\003 CITFile
| So i removed these additional magic lines, because i now use as
| additional line which test for 3 reserved null bytes.
|
| According to documentation now show also image dimension by lines
| >184 ulelong x \b, %u x
| >188 ulelong x %u
| The variable ScanlineOrient indicates the origin and the orientation
| of the scan lines. This is now shown by lines
| >194 ubyte x \b, orientation
| >194 ubyte &0x01 right
| >194 ubyte ^0x01 left
| >194 ubyte &0x02 down
| >194 ubyte ^0x02 top
| >194 ubyte &0x04 horizontal
| >194 ubyte ^0x04 vertical
| The shown information for inspected images can be verified by running
| nconvert of xnview suite with -fullinfo option.
|
| After applying the above mentioned modifications by patch
| file-5.37-cad-intergraph.diff then duplicate identification vanish and
| i get a more precise output like:
|
| civsur.cel: Bentley/Intergraph Microstation CAD cell library
| COMP27.RGB: Intergraph raster image, Adaptive RLE RGB,
| 640 x 480, orientation left top horizontal
| COMP9.rle: Intergraph raster image, Run-Length Encoded 1-bit,
| 640 x 480, orientation left top horizontal
| FLOORPLA.DGN: Bentley/Intergraph Microstation CAD drawing 2D,
| units m mm
| LONGLAT.CIT: Intergraph raster image, CCITT Group 4 1-bit,
| 1064 x 1201, orientation left top horizontal
| samp15.dgn: Bentley/Intergraph Microstation CAD drawing 2D,
| units FT IN
| seed2d_b.dgn: Bentley/Intergraph Microstation CAD drawing 2D,
| units ' "
| seed3d_b.dgn: Bentley/Intergraph Microstation CAD drawing 3D,
| units ' "
| WHEEL.DGN: Bentley/Intergraph Microstation CAD drawing 3D,
| units mu su
| WRENCH.DGN: Bentley/Intergraph Microstation CAD drawing 2D,
| units in th
|
| I hope my diff file can be applied in future version of
| file utility.
|
Committed, thanks!
christos
More information about the File
mailing list