[File] [PATCH] of Magdir/os2, msdos for OS/2 help message *.msg+ *.hlp *.inf *.ini *.dos

Christos Zoulas christos at zoulas.com
Sun Aug 30 16:23:26 UTC 2020


Committed, thanks!

christos

> On Aug 29, 2020, at 5:13 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hello,
> some days ago i handled some OS/2 disks. When running file command
> version 5.39 with -k option on such OS/2 files and some similar files
> i get an output like:
> 
> echo.sys:     DOS executable (block device driver)
> ELNK.DOS:     data
> EPABKBKS.HLP: OS/2 HLP (Help)
> EPW.INI:      DOS executable (block device driver)
> 	      OS/2 INI
> IBMMPC.DOS:   data
> IBMTOK.DOS:   DOS executable (character device driver,
> 	      control strings-support)
> LSIH.MSG:     data
> NWREQOS2.MSG: data
> OS2PING.INF:  OS/2 INF (OS2PING Help File)
> PNG.MSG:      data
> REX.MSG:      data
> VPD.INI:      DOS executable (block device driver)
> 	      OS/2 INI
> XDF.MSG:      data
> XI1.MSG:      data
> 
> With --extension option in most cases only ??? is displayed
> Furthermore with -i option for many samples only generic
> application/octet-stream is shown.
> 
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). This list the used
> file name extension and often with -v option the related URL
> pointing to used file format information.
> 
> Luckily TrID tool identifies msg files as "OS/2 help Message" and
> displays related URL. This is now expressed inside Magdir/os2 by
> additional comment line like
> # URL:		http://fileformats.archiveteam.org/wiki/MSG_(OS/2)
> More information about that file format can be found in header file
> of MKMSGF clone.This is now expressed by additional comment line
> like:
> # github.com/OS2World/UTIL-SYSTEM-MKMSGF/blob/master/mkmsgf.h
> This software is just a clone of the original IBM mkmsgf tool. So
> some fields and meaning are not explained, especially for old
> versions and message text pointer handling. Or i am too stupid to
> understand the sources.
> 
> According to reference such MSG files start with characteristic 8
> byte magic. That is expressed by magic lines like
> 0	string			\xffMKMSGF\0	OS/2 help message
> !:mime	application/x-os2-msg
> !:ext	msg
> Afterwards comes 3 byte identifier string like DOS, NET, REX, SYS etc
> .
> That is shown by following line
>> 8	string				x	'%.3s'
> 
> To keep output columns low i show values only for seldom or exotic
> cases.
> So the file format version is stored as a 2 byte value. Only two
> values should occur, where 0 means "old" version and 2 means "new"
> version. Most examples especially nowadays are new. So show this
> information only for old versions by line like
>> 16	uleshort      			!2	\b, version %u
> 
> In the byte offset16bit is stored if the message index table use
> 16-bit pointers (1) or 32-bit pointers (0). Most messages examples
> are small (<64K). For such cases 16 bit pointers are used. But for
> some large examples like NWREQOS2.MSG 32 bit pointers are used. So
> show this information by line like
>> 15	ubyte				=0	\b, 32-bit
> 
> In the short indextaboffset variable the offset of the message index
> table is stored. For "new" examples i only found value 1Fh. That
> means index table directly comes after header. For "old" variant i
> only found value 0. That seems to mean "use default value". Here i
> also found table at offset 1Fh. So show possible unusual table
> value by lines
>> 18	uleshort			>0
>>> 18		uleshort		!0x1f	\b, at 0x%x index
> So test in one branch for 32-bit pointers, then display offset to
> message block and display first message text by lines like:
>>> 15		ubyte			=0
>>>> (18.s)		ulelong		x	\b, at 0x%x
>>>>> (&-4.l)		ubyte		x	%c-type
>>>>>> &0		string		x	%s
> According to os2-1.0-ptk-tools-1988.pdf the string start with 1 ASCII
> character, that describes the type of the message, where E means
> Error, H means Help. I is used for Information and P for Prompt. ?
> seems to mean unused or empty. After that character comes the real
> message text.
> Then i do similar procedure for 16-bit variant and for "old" examples
> with zero indextaboffset.
> 
> The last fields in header before padding zero bytes are countryinfo
> and next country info. For version 0 these fields are zero. So show
> only non zero values by lines
>> 20	uleshort		!0	\b, at 0x%x countryinfo
>>> 22		uleshort	>0	\b, at 0x%x next
> 
> Because the country block contains some interesting information i
> jump to this offset and inspect block by sub routine os2-msg-info.
> This looks like:
>>> (20.s) use				os2-msg-info
> 0	name		os2-msg-info
> 
> The possible non zero language id of message file is shown in that
> sub routine by lines like
>> 3	uleshort	>0		\b, language %u
>>> 5	uleshort	x		\b_%u
> So for example LSIH.MSG value 7_1 means German_Germany or value 12_3
> means Canadian French.
> 
> After language part comes code page part. First comes the number of
> used code pages (maximal 16), followed by used DOS code page numbers.
> This is expressed by lines like.
>> 7	uleshort	x		\b, %u code page
>> 7	uleshort	>1		\bs
>> 7	uleshort	<17
>>> 9	uleshort	>0		%u
>>>> 7	uleshort	>1
>>>>> 11	uleshort	x		%u
> Many examples like NWREQOS2.MSG just contain only 1 code page and
> that is 437 in most cases. But a few examples like XDF.MSG contain 2
> code pages and often these two are 437 and 850.
> After the code page part the filename like dbaseos2.msg, xdfh.msg,
> dde4c01e.msg, os2ldr.mgr and so on is stored. So show this
> information by line like
>> 41	string		x	 	\b, %s
> 
> To show a user defined mime type and file name extension for HLP
> files i also add more lines inside Magdir/os2 after magic line
> 0   string  HSP\x10\x9b\x00     OS/2 HLP
> So now for samples like EPABKBKS.HLP i show that information by 2 lin
> es
> !:mime	application/x-os2-hlp
> !:ext	hlp
> Do the same procedure for OS/2 INF, OS/2 INI. The last one is
> identified by magic line
> 0  string   \xff\xff\xff\xff\x14\0\0\0  OS/2 INI
> 
> This looks similar to DOS device drivers, which are identified by
> magic line inside Magdir/msdos like
> 0	ulequad&0x07a0ffffffff		0xffffffff
> So OS/2 INI-files like EPW.INI and VPD.INI are misidentified as DOS
> device driver by Magdir/msdos. So i add an additional test to skip
> OS/2 INI-files. This now becomes like
> 0	ulequad&0x07a0ffffffff		0xffffffff
>> 4  	ubelong   			!0x14000000
>>> 0	use				msdos-driver
> 
> The URL pointing to information DOS device driver probably does not
> exist any more. So i look for similar sites on the net. This is now
> expressed by additional lines like:
> # URL: http://fileformats.archiveteam.org/wiki/DOS_device_driver
> # Reference: http://www.delorie.com/djgpp/doc/rbinter/it/46/16.html
> 
> At the beginning a 4 byte pointer to next driver is stored.
> For most (about 94%=98/104 for my inspected samples) DOS device
> drivers this value is 0xffffffff. These are matched by above
> construction. Unfortunately this is not a strict condition. Some
> examples like Uwe Sieber echo.sys found in archive cfg_echo.zip are
> recognized by explicitly looking for characteristic byte sequences at
> the beginning and then calling displaying subroutine by lines like
> 0	ulequad				0x001600000000ffff
>> 0	use				msdos-driver
> So show now such an unusual pointer value at the end of that
> subroutine by additional line like
>> 0	ulelong		!0xffffffff	with pointer 0x%x
> 
> This was useful for me, when comparing identification
> success/failure of file command with other tools like TrID.
> 
> According to updated reference also DOS is used as file name
> extension. So i found on OS/2 disc samples like IBM Token-Ring
> adapter driver IBMTOK.DOS. So file name extension line now becomes li
> ke
> !:ext	sys/dev/bin/dos
> 
> I found no explanation why and when DOS file name extension instead
> SYS is used. Maybe to explicitly distinguish such drivers from
> drivers or executables for the OS/2 system like IBMTOK.OS2.
> 
> Furthermore i found DOS driver examples inside archive DLSNETDR.ZIP
> on OS2 CD-ROM which are not detected, because the bits that are
> declared in old documentation as reserved are used. But these 2
> examples use expected starting pointer value. So i add more
> additional lines for that 2 exceptions like:
> 0	ulequad				0x027ac0c0ffffffff
>> 0	use				msdos-driver
> 0	ulequad				0x00228880ffffffff
>> 0	use				msdos-driver
> Maybe it is possible to merge some DOS driver branches.
> 
> After applying the above mentioned modifications by patches
> file-5.39-os2.diff, file-5.39-msdos-os2.diff, the misidentifications
> vanish and i get a more precise output like:
> 
> echo.sys:     DOS executable (block device driver)
> 	      with pointer 0xffff
> ELNK.DOS:     DOS executable (character device driver,
> 	      IOCTL-,control strings-support)
> EPABKBKS.HLP: OS/2 HLP (Help)
> EPW.INI:      OS/2 INI
> IBMMPC.DOS:   DOS executable (character device driver,
> 	      close media-support)
> IBMTOK.DOS:   DOS executable (character device driver,
> 	      control strings-support)
> LSIH.MSG:     OS/2 help message 'LSI', 113 messages,
> 	      number 559
> 	      at 0x230 H-type Ursache:
> 	      Die Version des z. Zt. installierten,
> 	      at 0x101 countryinfo, language 7_1,
> 	      1 code page 850, LSIH.MSG
> NWREQOS2.MSG: OS/2 help message 'REQ', 1302 messages,
> 	      1st number 98, 32-bit,
> 	      at 0x15a5 I-type VeRsIoN=2.11,
> 	      at 0x1477 countryinfo,
> 	      1 code page 437, nwreqos2.msg
> OS2PING.INF:  OS/2 INF (OS2PING Help File)
> PNG.MSG:      OS/2 help message 'IIC', 140 messages,
> 	      1st number 4001, version 0,
> 	      at 0x137 I-type PING -
> 	      ICMP Echoanforderung/-antwort %8.%9,
> 	      at 0x164 I-type Copyright (c) 1995 Network TeleSystems,
> 	      Inc. Alle Rechte vorbehalten.,
> 	      at 0x1ac E-type Paketgr”áe zu groá. Max. Daten = %9 Byte
> REX.MSG:      OS/2 help message 'REX', 127 messages,
> 	      version 0,
> 	      at 0x11d W-type ,
> 	      at 0x120 W-type %1File Table full%2,
> 	      at 0x136 W-type
> VPD.INI:      OS/2 INI
> XDF.MSG:      OS/2 help message 'XDF', 20 messages,
> 	      1st number 3502, number 373
> 	      at 0x194 P-type Quellendiskette in
> 	      Laufwerk %1 einlegen,
> 	      at 0x47 countryinfo, at 0x817 next,
> 	      2 code pages 850 437, xdf.msg
> XI1.MSG:      OS/2 help message 'XI1', 180 messages,
> 	      version 0,
> 	      at 0x187 ?-type ,
> 	      at 0x18a E-type Fehler beim Aufruf \201ber
> 	      die Befehlszeile.
> 	      Es wurden nicht alle Parameter oder nicht
> 	      unterst\201tzte Parameter/Werte angegeben.,
> 	      at 0x208 E-type Datei CONFIG.SYS kann nicht wie
> 	      \201ber Parameter "/TU:" vorgeschrieben
> 	      gefunden werden.
> 
> I hope my 2 diff files can be applied in future version of
> file utility.
> 
> With best wishes
> Jörg Jenderek
> - --
> Jörg Jenderek
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
> 
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCX0rE2QAKCRCv8rHJQhrU
> 1iThAKC/G+HW+bcCH7wa3GROVGBU9j1GLACfTYPco7gmR0YqhcOAT0HRwQ823Yo=
> =uzjF
> -----END PGP SIGNATURE-----
> <file-5_39-msdos-dos-os2_diff.DEFANGED-0><file-5_39-os2-msg_diff.DEFANGED-1><file-5_39-msdos-dos-os2_diff_sig.DEFANGED-2><file-5_39-os2-msg_diff_sig.DEFANGED-3>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20200830/a203e4a3/attachment.asc>


More information about the File mailing list