[File] [PATCH] of Magdir/fonts for GEM GDOS font; update

Christos Zoulas christos at zoulas.com
Tue Jul 16 11:13:21 UTC 2019


Hi,

I've committed the changes but I've left them still commented out. Perhaps we can
add them again with a negative strength, so that they don't interfere with other magic?
Or there should be magic entries with negative strength that are only considered if
a flag is specified on the command line?

christos

> On Jul 9, 2019, at 9:47 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
> 
> Hello,
> some weeks ago i send a patch for file version 5.36 to recognize
> GEM GDOS fonts with file extension fnt or gft.
> 
> Unfortunately the test lines are not specific enough. So some other
> files were misidentified as "GEM GDOS font" by Magdir/fonts. For such
> bad examples and some extreme font examples (*.FNT) i got an output like:
> 
> CAROUSEL.DOC:           GEM GDOS font _@\011\004 45, ID 0xa5db,
> 	lightening mask 0x0, skewing mask 0x0
> 	Microsoft WinWord 2.0 Document
> cl8m8ocofedso.testfile: Audio file with ID3 version 2.4.0, contains:
> 			GEM GDOS font  25776, ID 0xfbff
> crmanual.doc:           GEM GDOS font L \002 30, ID 0xa59b,
> 	lightening mask 0x0, skewing mask 0x0
> DOC20A.DOC:             GEM GDOS font !@\011\004 45, ID 0xa5db,
> 	lightening mask 0x0, skewing mask 0x0
> 	Microsoft WinWord 2.0 Document
> H1CELT72.FNT:           GEM GDOS font Celtic #s 72, ID 0x00ca
> HyperMover:             GEM GDOS font
> 	STAK\377\377\377\377\357\260\362 10, ID 0x0000,
> 	lightening mask 0x0, skewing mask 0x0
> oem.hlp:                MS Windows 3.1 help,
> 	Mon May 01 20:47:30 1995, 6033 bytes
> 			GEM GDOS font C 3, ID 0x5f3f,
> 	lightening mask 0x0, skewing mask 0x0
> PRICELIS.DOC:           GEM GDOS font Z@\011\004 45, ID 0xa5db,
> 	lightening mask 0x0, skewing mask 0x0
> 	Microsoft WinWord 2.0 Document
> TECHREF.DOC:            DOS 2.0-3.2 backed up file \TECHREF.DOC;
> 			GEM GDOS font \220 \002 33, ID 0xa59b,
> 	lightening mask 0x0, skewing mask 0x0
> TEMPLATE.DOC:           GEM GDOS font p@\011\004 45, ID 0xa5db,
> 	lightening mask 0x0, skewing mask 0x0
> 	Microsoft WinWord 2.0 Document
> winword2.doc:           GEM GDOS font 1@\011\004 45, ID 0xa5db,
> 	lightening mask 0x0, skewing mask 0x0
> 	Microsoft WinWord 2.0 Document
> WYEE24HI.FNT:           GEM GDOS font WYE 24, ID 0x0073
> 
> Unfortunately this is not unique enough, but this is not a problem
> because identifying and showing parts are separated. So i add
> additional test lines for such examples.
> 
> Furthermore the specification for GEM fonts has no exact value
> specification. So the font name is shown by line
>> 4	string		x		%.32s
> Often i found common font names like Century-Schoolbook-Normal,
> Courier, ding bats used also in other font types.
> The names consist of "long" words to be recognized by human readers
> and consist mainly of latin letters, but sometimes name contains
> special printable characters like in "LC-S. Clay Wilson", "Celtic #s",
> "Big&Tall", "Hollywood  ** DEMO VERSION **".
> The shortest found font name was 3 byte string WYE. For many bad
> examples interpreted font name often contains low Control-characters.
> So the used test line for valid font name is too common like
>>>> 4	ubeshort	>0x1F00
> So this becomes more strict by line
>>>> 4	ulelong		>0x001F1f1F
> So now bad samples like oem.hlp are skipped.
> 
> The face size in points is shown by line
>> 2	uleshort	x		%u
> Typical values are 12, 18, 24 and 36, which are known from other font
> types. Theoretical 65535 can appear, but such high font sizes are
> unrealistic. So i tested for highest found value 48 like in KLINGON
> font H1KLIN48.FNT by line
>>> 2	uleshort	<49
> Unfortunately this test was too strict, because i found a font with
> size 72. That is the Celtic font H1CELT72.FNT. So relaxed test line
> now becomes
>>> 2	uleshort	<73
> Audio file cl8m8ocofedso.testfile was interpreted as GEM font variant
> with 5555h mask value with high font size 25776. So i also add this
> font size test line inside branch with 5555h mask values.
> 
> At that point there exist samples like HyperMover with valid font size
> value. And the font name looks at first glance valid like STAK\377.
> So i look for additional tests. The minimal GEM font header size is
> 84 bytes (54h). After the header comes other structures like
> horizontal offset table, character offset table, font data in non
> determined order. But if a structure occurs after header without gap
> the lowest possible offset for structure is 54h. If structures are not
> so big 4 byte offset is just a little above the value like 20Eh.
> So now i also test for valid low positive offset to font data by line:
>>>>> 76	ulelong		>83
> Now bad examples like HyperMover and remaining Microsoft WinWord 2.0
> documents are skipped.
> 
> After applying the above mentioned modifications by patch
> file-5.37-fonts-gem.diff then misidentified files vanish and i get an
> output like:
> 
> CAROUSEL.DOC:           Microsoft WinWord 2.0 Document
> cl8m8ocofedso.testfile: Audio file with ID3 version 2.4.0
> crmanual.doc:           data
> DOC20A.DOC:             Microsoft WinWord 2.0 Document
> H1CELT72.FNT:           GEM GDOS font Celtic #s 72, ID 0x00ca,
> 	0x142 foffset
> HyperMover:             data
> oem.hlp:                MS Windows 3.1 help,
> 	Mon May 01 20:47:30 1995, 6033 bytes
> PRICELIS.DOC:           Microsoft WinWord 2.0 Document
> TECHREF.DOC:            DOS 2.0-3.2 backed up file \TECHREF.DOC
> TEMPLATE.DOC:           Microsoft WinWord 2.0 Document
> winword2.doc:           Microsoft WinWord 2.0 Document
> WYEE24HI.FNT:           GEM GDOS font WYE 24, ID 0x0073,
> 	0x158 foffset
> 
> I hope my diff file can be applied in future version of
> file utility.
> 
> With best wishes
> Jörg Jenderek
> --
> Jörg Jenderek
> 
> 
> 
> 
> 
> 
> <file-5_37-fonts-gem_diff.DEFANGED-6>-- 
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>



More information about the File mailing list