[File] [PATCH] of Magdir/fonts for GEM GDOS font; update

Jörg Jenderek joerg.jen.der.ek at gmx.net
Wed Jul 10 01:47:20 UTC 2019


Hello,
some weeks ago i send a patch for file version 5.36 to recognize
GEM GDOS fonts with file extension fnt or gft.

Unfortunately the test lines are not specific enough. So some other
files were misidentified as "GEM GDOS font" by Magdir/fonts. For such
bad examples and some extreme font examples (*.FNT) i got an output like:

CAROUSEL.DOC:           GEM GDOS font _@\011\004 45, ID 0xa5db,
	lightening mask 0x0, skewing mask 0x0
	Microsoft WinWord 2.0 Document
cl8m8ocofedso.testfile: Audio file with ID3 version 2.4.0, contains:
			GEM GDOS font  25776, ID 0xfbff
crmanual.doc:           GEM GDOS font L \002 30, ID 0xa59b,
	lightening mask 0x0, skewing mask 0x0
DOC20A.DOC:             GEM GDOS font !@\011\004 45, ID 0xa5db,
	lightening mask 0x0, skewing mask 0x0
	Microsoft WinWord 2.0 Document
H1CELT72.FNT:           GEM GDOS font Celtic #s 72, ID 0x00ca
HyperMover:             GEM GDOS font
	STAK\377\377\377\377\357\260\362 10, ID 0x0000,
	lightening mask 0x0, skewing mask 0x0
oem.hlp:                MS Windows 3.1 help,
	Mon May 01 20:47:30 1995, 6033 bytes
			GEM GDOS font C 3, ID 0x5f3f,
	lightening mask 0x0, skewing mask 0x0
PRICELIS.DOC:           GEM GDOS font Z@\011\004 45, ID 0xa5db,
	lightening mask 0x0, skewing mask 0x0
	Microsoft WinWord 2.0 Document
TECHREF.DOC:            DOS 2.0-3.2 backed up file \TECHREF.DOC;
			GEM GDOS font \220 \002 33, ID 0xa59b,
	lightening mask 0x0, skewing mask 0x0
TEMPLATE.DOC:           GEM GDOS font p@\011\004 45, ID 0xa5db,
	lightening mask 0x0, skewing mask 0x0
	Microsoft WinWord 2.0 Document
winword2.doc:           GEM GDOS font 1@\011\004 45, ID 0xa5db,
	lightening mask 0x0, skewing mask 0x0
	Microsoft WinWord 2.0 Document
WYEE24HI.FNT:           GEM GDOS font WYE 24, ID 0x0073

Unfortunately this is not unique enough, but this is not a problem
because identifying and showing parts are separated. So i add
additional test lines for such examples.

Furthermore the specification for GEM fonts has no exact value
specification. So the font name is shown by line
 >4	string		x		%.32s
Often i found common font names like Century-Schoolbook-Normal,
Courier, ding bats used also in other font types.
The names consist of "long" words to be recognized by human readers
and consist mainly of latin letters, but sometimes name contains
special printable characters like in "LC-S. Clay Wilson", "Celtic #s",
"Big&Tall", "Hollywood  ** DEMO VERSION **".
The shortest found font name was 3 byte string WYE. For many bad
examples interpreted font name often contains low Control-characters.
So the used test line for valid font name is too common like
 >>>4	ubeshort	>0x1F00
So this becomes more strict by line
 >>>4	ulelong		>0x001F1f1F
So now bad samples like oem.hlp are skipped.

The face size in points is shown by line
 >2	uleshort	x		%u
Typical values are 12, 18, 24 and 36, which are known from other font
types. Theoretical 65535 can appear, but such high font sizes are
unrealistic. So i tested for highest found value 48 like in KLINGON
font H1KLIN48.FNT by line
 >>2	uleshort	<49
Unfortunately this test was too strict, because i found a font with
size 72. That is the Celtic font H1CELT72.FNT. So relaxed test line
now becomes
 >>2	uleshort	<73
Audio file cl8m8ocofedso.testfile was interpreted as GEM font variant
with 5555h mask value with high font size 25776. So i also add this
font size test line inside branch with 5555h mask values.

At that point there exist samples like HyperMover with valid font size
value. And the font name looks at first glance valid like STAK\377.
So i look for additional tests. The minimal GEM font header size is
84 bytes (54h). After the header comes other structures like
horizontal offset table, character offset table, font data in non
determined order. But if a structure occurs after header without gap
the lowest possible offset for structure is 54h. If structures are not
so big 4 byte offset is just a little above the value like 20Eh.
So now i also test for valid low positive offset to font data by line:
 >>>>76	ulelong		>83
Now bad examples like HyperMover and remaining Microsoft WinWord 2.0
documents are skipped.

After applying the above mentioned modifications by patch
file-5.37-fonts-gem.diff then misidentified files vanish and i get an
output like:

CAROUSEL.DOC:           Microsoft WinWord 2.0 Document
cl8m8ocofedso.testfile: Audio file with ID3 version 2.4.0
crmanual.doc:           data
DOC20A.DOC:             Microsoft WinWord 2.0 Document
H1CELT72.FNT:           GEM GDOS font Celtic #s 72, ID 0x00ca,
	0x142 foffset
HyperMover:             data
oem.hlp:                MS Windows 3.1 help,
	Mon May 01 20:47:30 1995, 6033 bytes
PRICELIS.DOC:           Microsoft WinWord 2.0 Document
TECHREF.DOC:            DOS 2.0-3.2 backed up file \TECHREF.DOC
TEMPLATE.DOC:           Microsoft WinWord 2.0 Document
winword2.doc:           Microsoft WinWord 2.0 Document
WYEE24HI.FNT:           GEM GDOS font WYE 24, ID 0x0073,
	0x158 foffset

I hope my diff file can be applied in future version of
file utility.

With best wishes
Jörg Jenderek
--
Jörg Jenderek






-------------- next part --------------
--- file-5.37/magic/Magdir/fonts.old	2019-05-05 16:44:04 +0000
+++ file-5.37/magic/Magdir/fonts	2019-07-10 01:36:09 +0000
@@ -136,3 +136,5 @@
 62	ulelong		0x55555555
->0	use		gdos-font
+# skip cl8m8ocofedso.testfile by looking for face size lower/equal 72
+>2	uleshort	<73
+>>0	use		gdos-font
 # BOX18.GFT COWBOY30.GFT ROYALK30.GFT
@@ -141,9 +143,9 @@
 >2	uleshort	>2
-# skip DOS 2.0 backup id file ./msdos by looking for face size lower/equal 48
->>2	uleshort	<49
-# skip MS Windows ICO ./msdos by looking for valid face name
->>>4	ubeshort	>0x1F00
-# skip DOS executable BACKM212.COM by looking for horizontal offset table after header
-#>>>>68	ulelong		>87		OFFSET_OK
->>>>0	use		gdos-font
+# skip DOS 2.0 backup id file ./msdos by looking for face size lower/equal 72
+>>2	uleshort	<73
+# skip MS oem.hlp, some Windows ICO ./msdos by looking for valid long name like WYE
+>>>4	ulelong		>0x001F1f1F
+# skip Microsoft WinWord 2.0 ./msdos by looking for positive offset to font data
+>>>>76	ulelong		>83
+>>>>>0	use		gdos-font
 0	name		gdos-font
@@ -153,5 +155,5 @@
 !:ext	fnt/gtf
-# font name like University Bold
+# font name like Big&Tall, Celtic #s, Courier, University Bold, WYE
 >4	string		x		%.32s
-# face size in points 3-48
+# face size in points 3-72 SLSS03CG.FNT H1CELT72.FNT
 >2	uleshort	x		%u
@@ -159,9 +161,9 @@
 >0	uleshort	x		\b, ID 0x%4.4x
-# lowest character index in face (usually 32 for disk-loaded fonts).
-#>36	uleshort	x		\b, low character index %u
-# width of the widest character
+# lowest character index in face (4 but usually 32 for disk-loaded fonts)
+#>36	uleshort	!32		\b, unusual character index %u
+# width of the widest character like 0 8 10 12 16 24 32
 #>50	uleshort	x		\b, %u char width
-# width of the widest character cell
+# width of the widest character cell like 8 11 12 14 15 16 33 67
 #>52	uleshort	x		\b, %u cell width
-# thickening size
+# thickening size in pixel like 0 1 2 3 4 5 6 7 8
 #>58	uleshort	x		\b, %u thick
@@ -171,11 +173,13 @@
 >64	uleshort	!0x5555		\b, skewing mask 0x%x
-# offset to horizontal offset table 58h~88 5eh
-#>68	ulelong		>88		\b, 0x%x horizontal table offset
-# offset character offset table
+# offset to optional horizontal offset table 0 58h~88 5eh 252h
+#>68	ulelong		x		\b, 0x%x horizontal table offset
+# offset of character offset table 54h for many *.GFT 55h 58h 5Eh 120h 1D4h 202h 220h
 #>72	ulelong		x		\b, 0x%x coffset
-# offset to font data
-#>72	ulelong		x		\b, 0x%x foffset
-# form width in bytes
+# offset to font data like 116h 118h 158 20Ah 20Eh
+>76	ulelong		x		\b, 0x%x foffset
+# form width in bytes like 58 67 156 190 227 317 345
 #>80	uleshort	x		\b, %u fwidth
-# pointer to the next font, set by GDOS after loading
+# form height in bytes like 4 8 11 17 26 56 70 90 120 146 150
+#>82	uleshort	x		\b, %u fheight
+# pointer to the next font like 0 10000h 20000h 30000h 40000h 60000h 80000h E0000h D0000h 
 #>84	ulelong		x		\b, 0x%x noffset


More information about the File mailing list