[File] [PATCH] of Magdir/os2, msdos for OS/2 help message *.msg+ *.hlp *.inf *.ini *.dos

Jörg Jenderek joerg.jen.der.ek at gmx.net
Mon Aug 10 20:34:33 UTC 2020

some days ago i handled some OS/2 disks. When running file command
version 5.39 with -k option on such OS/2 files and some similar files
i get an output like:

echo.sys:     DOS executable (block device driver)
ELNK.DOS:     data
EPW.INI:      DOS executable (block device driver)
	      OS/2 INI
IBMMPC.DOS:   data
IBMTOK.DOS:   DOS executable (character device driver,
	      control strings-support)
LSIH.MSG:     data
PNG.MSG:      data
REX.MSG:      data
VPD.INI:      DOS executable (block device driver)
	      OS/2 INI
XDF.MSG:      data
XI1.MSG:      data

With --extension option in most cases only ??? is displayed
Furthermore with -i option for many samples only generic
application/octet-stream is shown.

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). This list the used
file name extension and often with -v option the related URL
pointing to used file format information.

Luckily TrID tool identifies msg files as "OS/2 help Message" and
displays related URL. This is now expressed inside Magdir/os2 by
additional comment line like
 # URL:		http://fileformats.archiveteam.org/wiki/MSG_(OS/2)
More information about that file format can be found in header file
of MKMSGF clone.This is now expressed by additional comment line
 # github.com/OS2World/UTIL-SYSTEM-MKMSGF/blob/master/mkmsgf.h
This software is just a clone of the original IBM mkmsgf tool. So
some fields and meaning are not explained, especially for old
versions and message text pointer handling. Or i am too stupid to
understand the sources.

According to reference such MSG files start with characteristic 8
byte magic. That is expressed by magic lines like
 0	string			\xffMKMSGF\0	OS/2 help message
 !:mime	application/x-os2-msg
 !:ext	msg
Afterwards comes 3 byte identifier string like DOS, NET, REX, SYS etc.
That is shown by following line
 >8	string				x	'%.3s'

To keep output columns low i show values only for seldom or exotic cases.
So the file format version is stored as a 2 byte value. Only two
values should occur, where 0 means "old" version and 2 means "new"
version. Most examples especially nowadays are new. So show this
information only for old versions by line like
 >16	uleshort      			!2	\b, version %u

In the byte offset16bit is stored if the message index table use
16-bit pointers (1) or 32-bit pointers (0). Most messages examples
are small (<64K). For such cases 16 bit pointers are used. But for
some large examples like NWREQOS2.MSG 32 bit pointers are used. So
show this information by line like
 >15	ubyte				=0	\b, 32-bit

In the short indextaboffset variable the offset of the message index
table is stored. For "new" examples i only found value 1Fh. That
means index table directly comes after header. For "old" variant i
only found value 0. That seems to mean "use default value". Here i
also found table at offset 1Fh. So show possible unusual table
value by lines
 >18	uleshort			>0
 >>18		uleshort		!0x1f	\b, at 0x%x index
So test in one branch for 32-bit pointers, then display offset to
message block and display first message text by lines like:
 >>15		ubyte			=0
 >>>(18.s)		ulelong		x	\b, at 0x%x
 >>>>(&-4.l)		ubyte		x	%c-type
 >>>>>&0		string		x	%s
According to os2-1.0-ptk-tools-1988.pdf the string start with 1 ASCII
character, that describes the type of the message, where E means
Error, H means Help. I is used for Information and P for Prompt. ?
seems to mean unused or empty. After that character comes the real
message text.
Then i do similar procedure for 16-bit variant and for "old" examples
with zero indextaboffset.

The last fields in header before padding zero bytes are countryinfo
and next country info. For version 0 these fields are zero. So show
only non zero values by lines
 >20	uleshort		!0	\b, at 0x%x countryinfo
 >>22		uleshort	>0	\b, at 0x%x next

Because the country block contains some interesting information i
jump to this offset and inspect block by sub routine os2-msg-info.
This looks like:
 >>(20.s) use				os2-msg-info
 0	name		os2-msg-info

The possible non zero language id of message file is shown in that
sub routine by lines like
 >3	uleshort	>0		\b, language %u
 >>5	uleshort	x		\b_%u
So for example LSIH.MSG value 7_1 means German_Germany or value 12_3
means Canadian French.

After language part comes code page part. First comes the number of
used code pages (maximal 16), followed by used DOS code page numbers.
This is expressed by lines like.
 >7	uleshort	x		\b, %u code page
 >7	uleshort	>1		\bs
 >7	uleshort	<17
 >>9	uleshort	>0		%u
 >>>7	uleshort	>1
 >>>>11	uleshort	x		%u
Many examples like NWREQOS2.MSG just contain only 1 code page and
that is 437 in most cases. But a few examples like XDF.MSG contain 2
code pages and often these two are 437 and 850.
After the code page part the filename like dbaseos2.msg, xdfh.msg,
dde4c01e.msg, os2ldr.mgr and so on is stored. So show this
information by line like
 >41	string		x	 	\b, %s

To show a user defined mime type and file name extension for HLP
files i also add more lines inside Magdir/os2 after magic line
 0   string  HSP\x10\x9b\x00     OS/2 HLP
So now for samples like EPABKBKS.HLP i show that information by 2 lines
 !:mime	application/x-os2-hlp
 !:ext	hlp
Do the same procedure for OS/2 INF, OS/2 INI. The last one is
identified by magic line
 0  string   \xff\xff\xff\xff\x14\0\0\0  OS/2 INI

This looks similar to DOS device drivers, which are identified by
magic line inside Magdir/msdos like
 0	ulequad&0x07a0ffffffff		0xffffffff
So OS/2 INI-files like EPW.INI and VPD.INI are misidentified as DOS
device driver by Magdir/msdos. So i add an additional test to skip
OS/2 INI-files. This now becomes like
 0	ulequad&0x07a0ffffffff		0xffffffff
 >4  	ubelong   			!0x14000000
 >>0	use				msdos-driver

The URL pointing to information DOS device driver probably does not
exist any more. So i look for similar sites on the net. This is now
expressed by additional lines like:
 # URL: http://fileformats.archiveteam.org/wiki/DOS_device_driver
 # Reference: http://www.delorie.com/djgpp/doc/rbinter/it/46/16.html

At the beginning a 4 byte pointer to next driver is stored.
For most (about 94%=98/104 for my inspected samples) DOS device
drivers this value is 0xffffffff. These are matched by above
construction. Unfortunately this is not a strict condition. Some
examples like Uwe Sieber echo.sys found in archive cfg_echo.zip are
recognized by explicitly looking for characteristic byte sequences at
the beginning and then calling displaying subroutine by lines like
 0	ulequad				0x001600000000ffff
 >0	use				msdos-driver
So show now such an unusual pointer value at the end of that
subroutine by additional line like
 >0	ulelong		!0xffffffff	with pointer 0x%x

This was useful for me, when comparing identification
success/failure of file command with other tools like TrID.

According to updated reference also DOS is used as file name
extension. So i found on OS/2 disc samples like IBM Token-Ring
adapter driver IBMTOK.DOS. So file name extension line now becomes like
 !:ext	sys/dev/bin/dos

I found no explanation why and when DOS file name extension instead
SYS is used. Maybe to explicitly distinguish such drivers from
drivers or executables for the OS/2 system like IBMTOK.OS2.

Furthermore i found DOS driver examples inside archive DLSNETDR.ZIP
on OS2 CD-ROM which are not detected, because the bits that are
declared in old documentation as reserved are used. But these 2
examples use expected starting pointer value. So i add more
additional lines for that 2 exceptions like:
 0	ulequad				0x027ac0c0ffffffff
 >0	use				msdos-driver
 0	ulequad				0x00228880ffffffff
 >0	use				msdos-driver
Maybe it is possible to merge some DOS driver branches.

After applying the above mentioned modifications by patches
file-5.39-os2.diff, file-5.39-msdos-os2.diff, the misidentifications
vanish and i get a more precise output like:

echo.sys:     DOS executable (block device driver)
	      with pointer 0xffff
ELNK.DOS:     DOS executable (character device driver,
	      IOCTL-,control strings-support)
EPW.INI:      OS/2 INI
IBMMPC.DOS:   DOS executable (character device driver,
	      close media-support)
IBMTOK.DOS:   DOS executable (character device driver,
	      control strings-support)
LSIH.MSG:     OS/2 help message 'LSI', 113 messages,
	      number 559
	      at 0x230 H-type Ursache:
	      Die Version des z. Zt. installierten,
	      at 0x101 countryinfo, language 7_1,
	      1 code page 850, LSIH.MSG
NWREQOS2.MSG: OS/2 help message 'REQ', 1302 messages,
	      1st number 98, 32-bit,
	      at 0x15a5 I-type VeRsIoN=2.11,
	      at 0x1477 countryinfo,
	      1 code page 437, nwreqos2.msg
PNG.MSG:      OS/2 help message 'IIC', 140 messages,
	      1st number 4001, version 0,
	      at 0x137 I-type PING -
	      ICMP Echoanforderung/-antwort %8.%9,
	      at 0x164 I-type Copyright (c) 1995 Network TeleSystems,
	      Inc. Alle Rechte vorbehalten.,
	      at 0x1ac E-type Paketgr”áe zu groá. Max. Daten = %9 Byte
REX.MSG:      OS/2 help message 'REX', 127 messages,
	      version 0,
	      at 0x11d W-type ,
	      at 0x120 W-type %1File Table full%2,
	      at 0x136 W-type
VPD.INI:      OS/2 INI
XDF.MSG:      OS/2 help message 'XDF', 20 messages,
	      1st number 3502, number 373
	      at 0x194 P-type Quellendiskette in
	      Laufwerk %1 einlegen,
	      at 0x47 countryinfo, at 0x817 next,
	      2 code pages 850 437, xdf.msg
XI1.MSG:      OS/2 help message 'XI1', 180 messages,
	      version 0,
	      at 0x187 ?-type ,
	      at 0x18a E-type Fehler beim Aufruf \201ber
	      die Befehlszeile.
	      Es wurden nicht alle Parameter oder nicht
	      unterst\201tzte Parameter/Werte angegeben.,
	      at 0x208 E-type Datei CONFIG.SYS kann nicht wie
	      \201ber Parameter "/TU:" vorgeschrieben
	      gefunden werden.

I hope my 2 diff files can be applied in future version of
file utility.

With best wishes
Jörg Jenderek
Jörg Jenderek

-------------- next part --------------
--- file-5.39/magic/Magdir/msdos.old	2020-05-31 10:34:40 +0000
+++ file-5.39/magic/Magdir/msdos	2020-08-10 14:43:42 +0000
@@ -387,6 +387,10 @@
-# DOS device driver updated by Joerg Jenderek at May 2011,Mar 2017
+# DOS device driver updated by Joerg Jenderek at May 2011,Mar 2017,Aug 2020
+# URL:		http://fileformats.archiveteam.org/wiki/DOS_device_driver
+# Reference:	http://www.delorie.com/djgpp/doc/rbinter/it/46/16.html
 # https://amaus.net/static/S100/IBM/software/DOS/DOS%20techref/CHAPTER.009
 0	ulequad&0x07a0ffffffff		0xffffffff
->0	use				msdos-driver
+# skip OS/2 INI ./os2
+>4  ubelong   !0x14000000
+>>0	use				msdos-driver
 0       name    			msdos-driver		DOS executable (
@@ -395,3 +399,4 @@
 # also found FreeDOS print driver SPOOL.DEV and disc compression driver STACLOAD.BIN
-!:ext	sys/dev/bin
+# and IBM Token-Ring adapter IBMTOK.DOS. Why and when DOS instead SYS is used?
+!:ext	sys/dev/bin/dos
 >40	search/7			UPX!			\bUPX compressed
@@ -458,2 +463,3 @@
 >0	ubyte				x			\b)
+>0	ulelong				!0xffffffff		with pointer 0x%x
 # DOS driver cmd640x.sys has 0x12 instead of 0xffffffff for pointer field to next device header
@@ -466,2 +472,3 @@
 >0	use				msdos-driver
+# https://www.uwe-sieber.de/files/cfg_echo.zip
 0	ulequad				0x001600000000ffff
@@ -473,2 +480,8 @@
 >0	use				msdos-driver
+0	ulequad				0x027ac0c0ffffffff
+>0	use				msdos-driver
+0	ulequad				0x00228880ffffffff
+>0	use				msdos-driver
-------------- next part --------------
--- file-5.39/magic/Magdir/os2.old	2020-05-31 10:34:40 +0000
+++ file-5.39/magic/Magdir/os2	2020-08-09 16:43:53 +0000
@@ -27,2 +27,4 @@
 # >>>>> OS/2 INF/HLP <<<<<  (source: Daniel Dissett ddissett at netcom.com)
+# URL:		http://fileformats.archiveteam.org/wiki/INF/HLP_(OS/2)
+# Reference:	http://www.edm2.com/0308/inf.html
 # Carl Hauser (chauser.parc at xerox.com) and
@@ -43,7 +45,123 @@
 0   string  HSP\x01\x9b\x00 OS/2 INF
+!:mime	application/x-os2-inf
+!:ext	inf
 >107 string >0                      (%s)
 0   string  HSP\x10\x9b\x00     OS/2 HLP
+!:mime	application/x-os2-hlp
+!:ext	hlp
 >107 string >0                      (%s)
+# From:		Joerg Jenderek
+# URL:		http://fileformats.archiveteam.org/wiki/MSG_(OS/2)
+# Reference:	https://github.com/OS2World/UTIL-SYSTEM-MKMSGF/blob/master/mkmsgf.h
+# Note:		created by MKMSGF.EXE. Text source can be recreated by E_MSGF
+#		example like OS001H.MSG
+0	string			\xffMKMSGF\0	OS/2 help message
+!:mime	application/x-os2-msg
+!:ext	msg
+# identifier[3] like: DOS NET REX SYS ...
+>8	string				x	'%.3s'
+# msgnumber: number of messages
+>11	uleshort			x	\b, %u messages
+# firstmsgnumber; number of the first message like: some times 0 often 1 169 1000 3502
+>13	uleshort      			>1	\b, 1st number %u
+# offset16bit; 1~Index table has 16-bit offsets (files<64k) 0~Index table has 32-bit offsets
+>15	ubyte				=0	\b, 32-bit
+#>15	ubyte				=1	\b, 16-bit
+# version; file version: 2~new 0~old
+>16	uleshort      			!2	\b, version %u
+# indextaboffset; offset of index table: 1F~after header 0~no index table for version 0?
+>18	uleshort			>0
+>>18		uleshort		!0x1f	\b, at 0x%x index
+#	32-bit offset
+>>15		ubyte			=0
+# offset with message table
+>>>(18.s)		ulelong		x	\b, at 0x%x
+# 1st message
+# http://www.os2museum.com/files/docs/os210ptk/os2-1.0-ptk-tools-1988.pdf
+# message type: E~Error H~Help I~Information P~Prompt W~Warning ?
+>>>>(&-4.l)		ubyte		x	%c-type
+>>>>>&0			string		x	%s
+#	16-bit offset
+>>15		ubyte			=1
+# msgnum; message number
+>>>(18.s)		uleshort	x	\b, number %u
+# msgindex; offset of message from begin of file
+>>>(18.s+2)		uleshort	x	at 0x%x
+# message type E H I P W ?
+>>>>(&-2.s)		ubyte		x	%c-type
+# skip newline carriage return
+>>>>>&0			ubeshort	=0x0D0a
+>>>>>>&0		string		x	%s
+>>>>>&0			ubeshort	!0x0D0a
+>>>>>>&-2		string		x	%s
+#		for version 0 index table apparently at offset 1F
+>16	uleshort      			0
+>>15		ubyte			1
+# 1st message 16-bit
+>>>0x1F			uleshort	x	\b, at 0x%x
+# message type: E~Error H~Help I~Information P~Prompt W~Warning ?
+>>>>(0x1F.s)		ubyte		x	%c-type
+>>>>>&0			string		x	%s
+# 2nd message 16-bit
+>>>0x21			uleshort	x	\b, at 0x%x
+>>>>(0x21.s)		ubyte		x	%c-type
+>>>>>&0			string		x	%s
+# 3rd message 16-bit
+>>>0x23			uleshort	x	\b, at 0x%x
+>>>>(0x23.s)		ubyte		x	%c-type
+>>>>>&0			string		x	%s
+#		version 0 32-bit
+>>15		ubyte			0
+# 1st message 32-bit
+>>>0x1f			ulelong		x	\b, at 0x%x
+>>>>(0x1F.l)		ubyte		x	%c-type
+>>>>>&0			string		x	%s
+# 2nd message 32-bit
+>>>0x23			ulelong		x	\b, at 0x%x
+>>>>(0x23.l)		ubyte		x	%c-type
+>>>>>&0			string		x	%s
+# 3rd message 32-bit
+>>>0x27			ulelong		x	\b, AT 0x%x
+>>>>(0x27.l)		ubyte		x	 %c-type
+>>>>>&0			string		x	%s
+# countryinfo; offset of country info block: 0 for version 0
+>20	uleshort			!0	\b, at 0x%x countryinfo
+# nextcoutryinfo
+>>22		uleshort		>0	\b, at 0x%x next
+# reserved[5]; Must be 0
+>>25	ulelong		!0		\b, RESERVED 0x%x 
+>>(20.s) use				os2-msg-info
+#	display country info block of MKMSGF message file
+0	name		os2-msg-info
+# bytesperchar; bytes per char: 1~SBCS 2~DBCS
+>0	ubyte		>1		\b, %u bytes/char
+# reserved; Not known
+>1	uleshort	!0		\b, reserved 0x%x
+# langfamilyID; language family ID like: 0~? 1~Arabic ... 7~German ... 9~English  ... 34~Slovene
+>3	uleshort	>0		\b, language %u
+# langversionID; like: 7_1~German 7_2~Swiss German 12_1~French 12_3~Canadian French
+>>5	uleshort	x		\b_%u
+# langfamilyID too high. This should not happen
+>3	uleshort	>34		(invalid language)
+# codepagesnumber; number of codepages like: 1 2 ... 16
+>7	uleshort	x		\b, %u code page
+# plural s
+>7	uleshort	>1		\bs
+# too many number of codepages. This should not happen
+>7	uleshort	>16		(Too many)
+# codepages[16]; codepages list like 437 850 ...
+>7	uleshort	<17
+# 1st code page
+>>9	uleshort	>0		%u
+# possible 2nd code page number
+>>>7	uleshort	>1
+>>>>11	uleshort	x		%u
+# filename[260]; name of file like: dbaseos2.msg dde4c01e.msg os2ldr.mgr xdfh.msg ...
+>41	string		x	 	\b, %s
 # OS/2 INI (this is a guess)
 0  string   \xff\xff\xff\xff\x14\0\0\0  OS/2 INI
+!:mime	application/x-os2-ini
+!:ext	ini

More information about the File mailing list