[File] [PATCH] Magdir/windows MS Windows help corrections
Jörg Jenderek (GMX)
joerg.jen.der.ek at gmx.net
Mon Jun 24 19:32:54 UTC 2024
Hello,
some days ago i send patch for gfxboot compiled html help files. These
have file name suffix. This suffix is also used for MS Windows help files.
When running file command version 5.45 on such Windows help files and
sub class variants i get an output like:
CORELDRW.HLP: MS Windows help Bookmark, 1551334 bytes
GCC.ANN: MS Windows help annotation, 16634 bytes
IBMAVW.HLP: MS Windows 3.x help
, Sun Jan 8 20:03:27 1995, 289620 bytes
ICCviewer.GID: MS Windows help Bookmark, 10821 bytes
MSGRAPH.HLP: MS Windows help Bookmark, 333011 bytes
NAVW32.HLP: MS Windows 3.1 help
, Wed Nov 10 19:48:26 1999, 161405 bytes
NOTEPLAY.MVB: MS Windows 3.0 help
, Tue Feb 16 20:30:46 1993, 159760 bytes
PPAINTER.MVB: MS Windows y.z 0x1b help
, Sun May 16 10:25:46 2066, 19531 bytes
STMMHLP.MVB: data
UNIDRV.HLP: MS, 18022 bytes
WinHlp32: MS Windows help Bookmark, 1228 bytes
WinHlp32.BMK: MS Windows help Bookmark, 1175 bytes
arivideo.mvb: MS Windows help Bookmark, 765129 bytes
clarkhow.mvb: data
corelap.GID: MS Windows help Bookmark, 338226 bytes
discapp.hlp: MS Windows 3.0 help
, Mon Aug 19 22:18:58 1996, 73336 bytes
fmt-474-signature-id-748.hlp: data
s_in.mvb: MS Windows help Bookmark, 872466 bytes
viewerht.mvb: data
Because of some misidentification with --extension option also wrong
suffix are displayed. This looks like:
CORELDRW.HLP: bmk
GCC.ANN: ann
IBMAVW.HLP: hlp
ICCviewer.GID: bmk
MSGRAPH.HLP: bmk
NAVW32.HLP: hlp
NOTEPLAY.MVB: hlp
PPAINTER.MVB: hlp
STMMHLP.MVB: ???
UNIDRV.HLP: ???
WinHlp32: bmk
WinHlp32.BMK: bmk
arivideo.mvb: bmk
clarkhow.mvb: ???
corelap.GID: bmk
discapp.hlp: hlp
fmt-474-signature-id-748.hlp: ???
s_in.mvb: bmk
viewerht.mvb: ???
Furthermore with -i option for most samples application/x-winhelp or
application/winhelp is shown.
For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). This identifies all
such examples with lowest priority as "Multimedia Viewer Book" with MVB
file name suffix and generic application/octet-stream mime type by
mvb.trid.xml. This is triggered because Windows help and related
variants (like bookmark, annotation and global help index) start with
byte sequence 3F5F0300. With higher priority many samples are described
as "Windows HELP File" with HLP name suffix and no mime type by
hlp.trid.xml. Here additional at offset 7 is checked for byte sequence
00FFFFFFFFh. Samples like corelap.GID are described as "Windows Help
index" with GID suffix and application/x-winhelp by gid.trid.xml.
Samples like GCC.ANN are described as "Windows Help Annotation" with ANN
suffix and mime type application/x-winhelp by ann-winhelp.trid.xml. This
software list the used file name extension and with -v option often the
related URL pointing to used file format information (See appended
trid-v-winhelp.txt.gz).
For comparison reason i also run the file format identification utility
DROID (See https://sourceforge.net/projects/droid/). Here the HLP are
recognized correctly as "Windows Help File" with HLP suffix and without
mime type by PUID fmt/474. Compared with TrID it checks for byte
sequence FFFFFFFF at offset 8. Some MVB samples (like PPAINTER.MVB) are
correctly recognized as "Multimedia Viewer Book" with MVB suffix and
without mime type by PUID fmt/1800. This checks for SYSTEMHEADER pattern
0x6C03, minor version 0x1B00 (WMVC/MMVC media view) followed by major
version 0x0100. GID, ANN and BMK samples are not recognized or
misinterpreted as HLP (See appended droid-winhelp.csv.gz).
On Linux according to shared MIME-info database the samples are called
"WinHelp help file". The recognition happens by byte sequence 0x00035f3f
at offset 0. Here application/winhlp is shown as mime type. Here only
hlp is listed as suffix. That information can be seen in
freedesktop.org.xml.in source found for example on gitlab.freedesktop.org.
The examples are recognized by first check (as done by other tools)
inside Magdir/windows. This looks like:
0 lelong 0x00035f3f
In next step it checks for system file header magic 0x293B at
DirectoryStart+9 by line that looks like:
>(4.l+9) uleshort 0x293B MS
This fails for DROID sample fmt-474-signature-id-748.hlp and some MVB
samples (like STMMHLP.MVB clarkhow.mvb viewerht.mvb). The MVB samples
are Multimedia Viewer Books. Therefore such samples contain many
graphics. This implies a "big" file size and this often leads to "high"
DirectoryStart offset of FILEHEADER stored at offset four. Therefore the
above line is not executed because offset is beyond the standard limits.
This can be overcome when running for example file command with
additional "-P bytes=30335189" option. Then many of the MVB examples are
recognized but described wrong as "MS Windows help Bookmark". This
happens when sub classification as ANN, GID and HLP fails. The
assumption is that the sample then is a BMK (bookmark).
So damaged samples like STMMHLP.MVB or samples with "high" offsets are
now handled by additional branch. That looks like:
>(4.l+9) uleshort !0x293B MS Windows Multimedia Viewer Book
#!:mime application/octet-stream
!:ext mvb
>>12 lelong x (damaged or use higher '-P bytes' option)
Unfortunately the above line is not executed. Maybe this is a bug in
file command!
If test for Windows help annotation fails then the check for GID is done
by line that looks like:
>>>(4.l+0x65) string =|Pete Windows help Global Index
Unfortunately this Pete phrase occurs in few samples like corelap.GID at
little higher offset. So this above line now becomes like:
>>>(4.l+0x65) search/26 |Pete Windows help Global Index
The sub classification as HLP is done by looking for SYSTEMHEADER
pattern 0x6C03 and displaying part is done by sub routine help-ver-date.
If check for major version one fails then repeat this step seven times.
This starts like:
>>>>16 search/0x49AF/s \x6c\x03
>>>>>&0 use help-ver-date
>>>>>&4 leshort !1
Because of "high" file sizes and offset of MVB the above search range
for samples like viewerht.mvb must be raised. So the above first
iteration now becomes like:
>>>>16 search/0x1bbc370/s \x6c\x03
>>>>>&0 use help-ver-date
>>>>>&4 leshort !1
Then of course the search range in next iteration steps must be raised.
So second iteration step at the moment looks like:
>>>>>>&0 search/0x69AF/s \x6c\x03
>>>>>>>&0 use help-ver-date
>>>>>>>&4 leshort !1
Furthermore in sample like viewerht.mvb the byte 0x6C03 occur in very
short steps. So for that samples the second iteration step now becomes like:
>>>>>>&-2 search/0x1c4b6f0/s \x6c\x03
>>>>>>>&0 use help-ver-date
>>>>>>>&4 leshort !1
GCC.HLP is detected after 7 iterations. Because of high values of MVB i
need 13 iteration steps. If HLP at that position is not found i look at
FirstFreeBlock value at offset 8. According to other tools for HLP
samples this value is FFFFFFFFh, whereas for many MVB samples (like
arivideo.mvb clarkhow.mvb) this value is lower. So iteration number 13
looks like:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>&0 search/0x371d4/s \x6c\x03
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>&0 use help-ver-date
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>&4 leshort !1
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>8 lelong !0xFFffFFff
Windows Multimedia Viewer Book
!:mime application/x-winhelp
!:ext mvb
The displaying part (showing version date) for HLP is done by sub
routine help-ver-date. After check for Magic of SYSTEMHEADER 0x036C and
major version value one the version depending on minor version is shown,
followed by GenDate. This look like:
0 name help-ver-date
>0 leshort 0x036C
>>4 leshort 1 Windows
!:mime application/winhelp
!:ext hlp
>>>2 leshort 0x0F 3.x
>>>2 leshort 0x15 3.0
>>>2 leshort 0x21 3.1
>>>2 leshort 0x27 x.y
>>>2 leshort 0x33 95
>>>2 default x y.z
>>>>2 leshort x %#x
>>>2 leshort x help
>>>6 ldate x \b, %s
As mime type application/winhelp or application/winhlp is shown by some
tools. But at IANA there does not exist such an officially registered
type. So i now use application/x-winhelp. Furthermore the mentioned and
used minor version numbers must be considered as decimal not
hexadecimal. So sub classification is wrong. So sample like NAVW32.HLP
is described wrong as "MS Windows 3.1 help" instead of "Windows 95 help".
So the minor version part now becomes like:
>>>2 leshort 15 3.0
>>>2 leshort 21 3.1
>>>2 leshort 27
>>>2 leshort 33 95
The value 27 means WMVC/MMVC media view file. That implies MVB. So
afterwards a further sub classification level is done. So value 27
implies MVB and other value implies HLP. So this is done by lines that
look like:
>>>2 leshort !27
>>>>2 leshort x help
!:ext hlp
>>>2 leshort =27 Multimedia Viewer Book
!:ext mvb
Unfortunately one sample NOTEPLAY.MVB is still described as HLP.
Luckily with information given by the other tools i found page about
Multimedia Viewer Book on file formats archive team web site.
That informations are expressed inside Magdir/windows by comment lines like:
# URL: http://fileformats.archiveteam.org/wiki/Multimedia_Viewer_Book
# Ref.: http://mark0.net/download/triddefs_xml.7z/defs/m/mvb.trid.xml
There some specification and download links are listed.
After the date in sub routine 2 byte flag value is stored which
determinate the used compression. Afterwards HelpFileTitle is stored in
different structures depending on minor version. Often the title
correlates with file name.
So this information is now shown by lines like:
#>>>10 uleshort x \b, flags %#x
>>>2 leshort <17
>>>>12 string x \b, title "%s"
>>>2 leshort >16
#>>>>12 uleshort x \b, RecordType %u
# DataSize size of data
#>>>>14 uleshort x \b, DataSize %u
>>>>12 uleshort 1
>>>>>14 pstring/h >\0 \b, title "%s"
After applying the above mentioned modifications by patch
file-5.45-windows-hlp.diff then more missed samples (like STMMHLP.MVB
UNIDRV.HLP) are now recognized. Furthermore more details like title
(which often correlates with file name) is shown. This with additional
-P bytes=30335189 option now looks like:
CORELDRW.HLP: MS Windows 3.1 help
, Fri Jun 26 01:09:07 1992, title
"CorelDRAW! - Help"
, 1551334 bytes
GCC.ANN: MS Windows help annotation
, 16634 bytes
IBMAVW.HLP: MS Windows 3.0 help
, Sun Jan 8 20:03:27 1995, title
"IBM AntiVirus"
, 289620 bytes
ICCviewer.GID: MS Windows help Global Index
, 10821 bytes
MSGRAPH.HLP: MS Windows 3.0 help
, Mon Jan 13 22:24:12 1992, title
"Graph 3.0a"
, 333011 bytes
NAVW32.HLP: MS Windows 95 help
, Wed Nov 10 19:48:26 1999, title
"Norton AntiVirus for Windows 95/98"
, 161405 bytes
NOTEPLAY.MVB: MS Windows 3.1 help
, Tue Feb 16 20:30:46 1993, title
"NotePlay SE for Windows On-Line Manual"
, 159760 bytes
PPAINTER.MVB: MS Windows Multimedia Viewer Book
, Sun May 16 10:25:46 2066, title
"Picture Painter Help"
, 19531 bytes
STMMHLP.MVB: MS Windows Multimedia Viewer Book
, 1818497 bytes
UNIDRV.HLP: MS Windows 95 help
, Tue Jul 24 17:31:10 2001, title
"Windows"
, 18022 bytes
WinHlp32: MS Windows help Bookmark
, 1228 bytes
WinHlp32.BMK: MS Windows help Bookmark
, 1175 bytes
arivideo.mvb: MS Windows Multimedia Viewer Book
, 765129 bytes
clarkhow.mvb: MS Windows Multimedia Viewer Book
, 19093522 bytes
corelap.GID: MS Windows help Global Index
, 338226 bytes
discapp.hlp: MS Windows 3.1 help
, Mon Aug 19 22:18:58 1996, title
"Mwave Discriminator Help"
, 73336 bytes
fmt-474-signature-id-748.hlp: MS Windows Multimedia Viewer Book
s_in.mvb: MS Windows Multimedia Viewer Book
, Sun Oct 12 06:32:58 2064, title
"Ski instructions"
, 872466 bytes
viewerht.mvb: MS Windows Multimedia Viewer Book
, Wed Mar 5 02:45:34 2064
, 30335189 bytes
I hope my diff file can be applied in future version of file utility.
With best wishes
Jörg Jenderek
--
Jörg Jenderek
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-winhelp.txt.gz
Type: application/x-gzip
Size: 965 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20240624/d8f67582/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: droid-winhelp.csv.gz
Type: application/x-gzip
Size: 875 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20240624/d8f67582/attachment-0003.bin>
-------------- next part --------------
--- file-5.45/magic/Magdir/windows.old 2023-07-27 20:04:45.000000000 +0200
+++ file-5.45/magic/Magdir/windows 2024-06-24 21:14:59.791026100 +0200
@@ -482,10 +482,16 @@
# Warning: Current entry does not yet have a description for adding a MIME type
-!:mime application/winhelp
-!:ext hlp
+# not officially registered at IANA
+#!:mime application/winhelp
+#!:mime application/winhlp
+!:mime application/x-winhelp
# version Minor of help file format is hint for windows version
->>>2 leshort 0x0F 3.x
->>>2 leshort 0x15 3.0
->>>2 leshort 0x21 3.1
->>>2 leshort 0x27 x.y
->>>2 leshort 0x33 95
+# HC30 Windows 3.0 help file
+>>>2 leshort 15 3.0
+# HC31 Windows 3.1 help file
+>>>2 leshort 21 3.1
+# WMVC/MMVC media view file
+>>>2 leshort 27
+# MVC or HCW 4.00 Windows 95
+>>>2 leshort 33 95
+# next line should not happen
>>>2 default x y.z
@@ -493,7 +499,30 @@
# to complete message string like "MS Windows 3.x help file"
->>>2 leshort x help
+>>>2 leshort !27
+# HLP or few MVB like NOTEPLAY.MVB
+>>>>2 leshort x help
+!:ext hlp
+# URL: http://fileformats.archiveteam.org/wiki/Multimedia_Viewer_Book
+# Reference: http://mark0.net/download/triddefs_xml.7z/defs/m/mvb.trid.xml
+# Note: called "Multimedia Viewer Book" by TrID and by DROID via PUID fmt/1800
+>>>2 leshort =27 Multimedia Viewer Book
+!:ext mvb
# GenDate often older than file creation date
>>>6 ldate x \b, %s
-#
+# flags determine the compression
+#>>>10 uleshort x \b, flags %#x
+>>>2 leshort <17
+# HelpFileTitle
+>>>>12 string x \b, title "%s"
+>>>2 leshort >16
+# SYSTEMREC[].RecordType type of data in record; 1~help file title 2~COPYRIGHT 3~TOPICOFFSET Contents 4~Macro 5~*.ICO 6~HPJ-structure
+#>>>>12 uleshort x \b, RecordType %u
+# DataSize size of data
+#>>>>14 uleshort x \b, DataSize %u
+>>>>12 uleshort 1
+>>>>>14 pstring/h >\0 \b, title "%s"
# Magic for HeLP files
+# URL: http://fileformats.archiveteam.org/wiki/HLP_(WinHelp)
+# Reference: http://mark0.net/download/triddefs_xml.7z/defs/h/hlp.trid.xml
+# Note: called "Windows HELP File" by TrID, "Windows Help File" by DROID via PUID fmt/474 and
+# "WinHelp help file" by shared MIME-info database from freedesktop.org
0 lelong 0x00035f3f
@@ -502,2 +531,4 @@
>(4.l+9) uleshort 0x293B MS
+# URL: http://fileformats.archiveteam.org/wiki/WinHelp_annotation
+# Reference: http://mark0.net/download/triddefs_xml.7z/defs/a/ann.trid.xml
# look for @VERSION bmf.. like IBMAVW.ANN
@@ -507,4 +538,5 @@
>>0xD4 string !\x62\x6D\x66\x01\x00
-# "GID Help index" by TrID
->>>(4.l+0x65) string =|Pete Windows help Global Index
+# "GID Help index" by TrID by gid.trid.xml
+# sometimes at little higher offset like in corelap.GID
+>>>(4.l+0x65) search/26 |Pete Windows help Global Index
!:mime application/x-winhelp
@@ -512,43 +544,65 @@
# HeLP Bookmark or
-# "Windows HELP File" by TrID
->>>(4.l+0x65) string !|Pete
+# Multimedia_Viewer_Book or
+# "Windows HELP File" by TrID by hlp.trid.xml
+>>>(4.l+0x65) default x
# maybe there exist a cleaner way to detect HeLP fragments
-# brute search for Magic 0x036C with matching Major maximal 7 iterations
-# discapp.hlp
->>>>16 search/0x49AF/s \x6c\x03
+# brute search for Magic 0x036C with matching Major maximal 13 iterations
+# https://sembiance.com/fileFormatSamples/document/multimediaViewerBook/viewerht.mvb
+>>>>16 search/0x1bbc370/s \x6c\x03
>>>>>&0 use help-ver-date
>>>>>&4 leshort !1
-# putty.hlp
->>>>>>&0 search/0x69AF/s \x6c\x03
+# viewerht.mvb
+>>>>>>&-2 search/0x1c4b6f0/s \x6c\x03
>>>>>>>&0 use help-ver-date
>>>>>>>&4 leshort !1
->>>>>>>>&0 search/0x49AF/s \x6c\x03
+# https://sembiance.com/fileFormatSamples/document/multimediaViewerBook/clarkhow.mvb
+>>>>>>>>&0 search/0x34ab80/s \x6c\x03
>>>>>>>>>&0 use help-ver-date
>>>>>>>>>&4 leshort !1
->>>>>>>>>>&0 search/0x49AF/s \x6c\x03
+>>>>>>>>>>&0 search/0x473ab0/s \x6c\x03
>>>>>>>>>>>&0 use help-ver-date
>>>>>>>>>>>&4 leshort !1
->>>>>>>>>>>>&0 search/0x49AF/s \x6c\x03
+>>>>>>>>>>>>&0 search/0x739680/s \x6c\x03
>>>>>>>>>>>>>&0 use help-ver-date
>>>>>>>>>>>>>&4 leshort !1
->>>>>>>>>>>>>>&0 search/0x49AF/s \x6c\x03
+>>>>>>>>>>>>>>&0 search/0x76c030/s \x6c\x03
>>>>>>>>>>>>>>>&0 use help-ver-date
>>>>>>>>>>>>>>>&4 leshort !1
->>>>>>>>>>>>>>>>&0 search/0x49AF/s \x6c\x03
+>>>>>>>>>>>>>>>>&0 search/0x805c80/s \x6c\x03
# GCC.HLP is detected after 7 iterations
>>>>>>>>>>>>>>>>>&0 use help-ver-date
-# this only happens if bigger hlp file is detected after used search iterations
->>>>>>>>>>>>>>>>>&4 leshort !1 Windows y.z help
-!:mime application/winhelp
-!:ext hlp
+>>>>>>>>>>>>>>>>>&4 leshort !1
+>>>>>>>>>>>>>>>>>>&0 search/0x805c80/s \x6c\x03
+>>>>>>>>>>>>>>>>>>>&0 use help-ver-date
+>>>>>>>>>>>>>>>>>>>&4 leshort !1
+>>>>>>>>>>>>>>>>>>>>&0 search/0xb63480/s \x6c\x03
+>>>>>>>>>>>>>>>>>>>>>&0 use help-ver-date
+>>>>>>>>>>>>>>>>>>>>>&4 leshort !1
+>>>>>>>>>>>>>>>>>>>>>>&0 search/0xb7fe80/s \x6c\x03
+>>>>>>>>>>>>>>>>>>>>>>>&0 use help-ver-date
+>>>>>>>>>>>>>>>>>>>>>>>&4 leshort !1
+>>>>>>>>>>>>>>>>>>>>>>>>&0 search/0xb8ade0/s \x6c\x03
+>>>>>>>>>>>>>>>>>>>>>>>>>&0 use help-ver-date
+>>>>>>>>>>>>>>>>>>>>>>>>>&4 leshort !1
+>>>>>>>>>>>>>>>>>>>>>>>>>>&0 search/0x371d4/s \x6c\x03
+>>>>>>>>>>>>>>>>>>>>>>>>>>>&0 use help-ver-date
+>>>>>>>>>>>>>>>>>>>>>>>>>>>&4 leshort !1
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>&0 search/0x371d4/s \x6c\x03
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>&0 use help-ver-date
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>&4 leshort !1
+# https://sembiance.com/fileFormatSamples/document/multimediaViewerBook/arivideo.mvb
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>8 lelong !0xFFffFFff Windows Multimedia Viewer Book
+!:mime application/x-winhelp
+!:ext mvb
# repeat search again or following default line does not work
>>>>16 search/0x49AF/s \x6c\x03
-# remaining files should be HeLP Bookmark WinHlp32.BMK (XP 32-bit) or WinHlp32 (Windows 8.1 64-bit)
+# remaining files should be HeLP Bookmark WinHlp32.BMK (XP 32-bit) or WinHlp32 (Windows 7 8.1 64-bit)
+# typically found inside directory %LOCALAPPDATA%\Help
>>>>16 default x Windows help Bookmark
!:mime application/x-winhelp
-!:ext bmk
-## FirstFreeBlock normally FFFFFFFFh 10h for *ANN
-##>>8 lelong x \b, FirstFreeBlock %#8.8x
-# EntireFileSize
->>12 lelong x \b, %d bytes
+!:ext /bmk
+# DirectoryStart offset of FILEHEADER of internal directory
+#>4 lelong x \b, DirectoryStart %8.8x
+## FirstFreeBlock normally for *HLP FFFFFFFFh if no free list or 10h for *ANN
+#>>8 lelong x \b, FirstFreeBlock %#8.8x
## ReservedSpace normally 042Fh AFh for *.ANN
@@ -583,2 +637,12 @@
#>>(4.l+47) ubequad x \b, PageStart %#16.16llx
+# GRR: offset is not reachable in few samples like STMMHLP.MVB because probably damaged file
+# or DROID fmt-474-signature-id-748.hlp
+# or for example run file command with higher --parameter bytes=30335189
+>(4.l+9) uleshort !0x293B MS Windows Multimedia Viewer Book
+#!:mime application/octet-stream
+!:ext mvb
+# GRR: next line is not executed!
+>>12 lelong x (damaged or use higher '-P bytes' option)
+# EntireFileSize; biggest 1551334 for CORELDRW.HLP 30335189 for viewerht.mvb; smallest 28672 for open.mvb
+>12 lelong x \b, %d bytes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.45-windows-hlp.diff.sig
Type: application/octet-stream
Size: 2671 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20240624/d8f67582/attachment-0001.obj>
More information about the File
mailing list