[File] [PATCH] Magdir/windows MS Windows help corrections
Christos Zoulas
christos at zoulas.com
Mon Jun 24 22:46:16 UTC 2024
Committed, thanks!
christos
> On Jun 24, 2024, at 3:32 PM, Jörg Jenderek (GMX) <joerg.jen.der.ek at gmx.net> wrote:
>
> Hello,
>
> some days ago i send patch for gfxboot compiled html help files. These
> have file name suffix. This suffix is also used for MS Windows help files.
>
> When running file command version 5.45 on such Windows help files and
> sub class variants i get an output like:
>
> CORELDRW.HLP: MS Windows help Bookmark, 1551334 bytes
> GCC.ANN: MS Windows help annotation, 16634 bytes
> IBMAVW.HLP: MS Windows 3.x help
> , Sun Jan 8 20:03:27 1995, 289620 bytes
> ICCviewer.GID: MS Windows help Bookmark, 10821 bytes
> MSGRAPH.HLP: MS Windows help Bookmark, 333011 bytes
> NAVW32.HLP: MS Windows 3.1 help
> , Wed Nov 10 19:48:26 1999, 161405 bytes
> NOTEPLAY.MVB: MS Windows 3.0 help
> , Tue Feb 16 20:30:46 1993, 159760 bytes
> PPAINTER.MVB: MS Windows y.z 0x1b help
> , Sun May 16 10:25:46 2066, 19531 bytes
> STMMHLP.MVB: data
> UNIDRV.HLP: MS, 18022 bytes
> WinHlp32: MS Windows help Bookmark, 1228 bytes
> WinHlp32.BMK: MS Windows help Bookmark, 1175 bytes
> arivideo.mvb: MS Windows help Bookmark, 765129 bytes
> clarkhow.mvb: data
> corelap.GID: MS Windows help Bookmark, 338226 bytes
> discapp.hlp: MS Windows 3.0 help
> , Mon Aug 19 22:18:58 1996, 73336 bytes
> fmt-474-signature-id-748.hlp: data
> s_in.mvb: MS Windows help Bookmark, 872466 bytes
> viewerht.mvb: data
>
> Because of some misidentification with --extension option also wrong
> suffix are displayed. This looks like:
> CORELDRW.HLP: bmk
> GCC.ANN: ann
> IBMAVW.HLP: hlp
> ICCviewer.GID: bmk
> MSGRAPH.HLP: bmk
> NAVW32.HLP: hlp
> NOTEPLAY.MVB: hlp
> PPAINTER.MVB: hlp
> STMMHLP.MVB: ???
> UNIDRV.HLP: ???
> WinHlp32: bmk
> WinHlp32.BMK: bmk
> arivideo.mvb: bmk
> clarkhow.mvb: ???
> corelap.GID: bmk
> discapp.hlp: hlp
> fmt-474-signature-id-748.hlp: ???
> s_in.mvb: bmk
> viewerht.mvb: ???
>
> Furthermore with -i option for most samples application/x-winhelp or
> application/winhelp is shown.
>
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). This identifies all
> such examples with lowest priority as "Multimedia Viewer Book" with MVB
> file name suffix and generic application/octet-stream mime type by
> mvb.trid.xml. This is triggered because Windows help and related
> variants (like bookmark, annotation and global help index) start with
> byte sequence 3F5F0300. With higher priority many samples are described
> as "Windows HELP File" with HLP name suffix and no mime type by
> hlp.trid.xml. Here additional at offset 7 is checked for byte sequence
> 00FFFFFFFFh. Samples like corelap.GID are described as "Windows Help
> index" with GID suffix and application/x-winhelp by gid.trid.xml.
> Samples like GCC.ANN are described as "Windows Help Annotation" with ANN
> suffix and mime type application/x-winhelp by ann-winhelp.trid.xml. This
> software list the used file name extension and with -v option often the
> related URL pointing to used file format information (See appended
> trid-v-winhelp.txt.gz).
>
> For comparison reason i also run the file format identification utility
> DROID (See https://sourceforge.net/projects/droid/). Here the HLP are
> recognized correctly as "Windows Help File" with HLP suffix and without
> mime type by PUID fmt/474. Compared with TrID it checks for byte
> sequence FFFFFFFF at offset 8. Some MVB samples (like PPAINTER.MVB) are
> correctly recognized as "Multimedia Viewer Book" with MVB suffix and
> without mime type by PUID fmt/1800. This checks for SYSTEMHEADER pattern
> 0x6C03, minor version 0x1B00 (WMVC/MMVC media view) followed by major
> version 0x0100. GID, ANN and BMK samples are not recognized or
> misinterpreted as HLP (See appended droid-winhelp.csv.gz).
>
> On Linux according to shared MIME-info database the samples are called
> "WinHelp help file". The recognition happens by byte sequence 0x00035f3f
> at offset 0. Here application/winhlp is shown as mime type. Here only
> hlp is listed as suffix. That information can be seen in
> freedesktop.org.xml.in source found for example on gitlab.freedesktop.org.
>
> The examples are recognized by first check (as done by other tools)
> inside Magdir/windows. This looks like:
> 0 lelong 0x00035f3f
> In next step it checks for system file header magic 0x293B at
> DirectoryStart+9 by line that looks like:
> >(4.l+9) uleshort 0x293B MS
> This fails for DROID sample fmt-474-signature-id-748.hlp and some MVB
> samples (like STMMHLP.MVB clarkhow.mvb viewerht.mvb). The MVB samples
> are Multimedia Viewer Books. Therefore such samples contain many
> graphics. This implies a "big" file size and this often leads to "high"
> DirectoryStart offset of FILEHEADER stored at offset four. Therefore the
> above line is not executed because offset is beyond the standard limits.
> This can be overcome when running for example file command with
> additional "-P bytes=30335189" option. Then many of the MVB examples are
> recognized but described wrong as "MS Windows help Bookmark". This
> happens when sub classification as ANN, GID and HLP fails. The
> assumption is that the sample then is a BMK (bookmark).
> So damaged samples like STMMHLP.MVB or samples with "high" offsets are
> now handled by additional branch. That looks like:
> >(4.l+9) uleshort !0x293B MS Windows Multimedia Viewer Book
> #!:mime application/octet-stream
> !:ext mvb
> >>12 lelong x (damaged or use higher '-P bytes' option)
> Unfortunately the above line is not executed. Maybe this is a bug in
> file command!
>
> If test for Windows help annotation fails then the check for GID is done
> by line that looks like:
> >>>(4.l+0x65) string =|Pete Windows help Global Index
> Unfortunately this Pete phrase occurs in few samples like corelap.GID at
> little higher offset. So this above line now becomes like:
> >>>(4.l+0x65) search/26 |Pete Windows help Global Index
>
> The sub classification as HLP is done by looking for SYSTEMHEADER
> pattern 0x6C03 and displaying part is done by sub routine help-ver-date.
> If check for major version one fails then repeat this step seven times.
> This starts like:
> >>>>16 search/0x49AF/s \x6c\x03
> >>>>>&0 use help-ver-date
> >>>>>&4 leshort !1
> Because of "high" file sizes and offset of MVB the above search range
> for samples like viewerht.mvb must be raised. So the above first
> iteration now becomes like:
> >>>>16 search/0x1bbc370/s \x6c\x03
> >>>>>&0 use help-ver-date
> >>>>>&4 leshort !1
> Then of course the search range in next iteration steps must be raised.
> So second iteration step at the moment looks like:
> >>>>>>&0 search/0x69AF/s \x6c\x03
> >>>>>>>&0 use help-ver-date
> >>>>>>>&4 leshort !1
> Furthermore in sample like viewerht.mvb the byte 0x6C03 occur in very
> short steps. So for that samples the second iteration step now becomes like:
> >>>>>>&-2 search/0x1c4b6f0/s \x6c\x03
> >>>>>>>&0 use help-ver-date
> >>>>>>>&4 leshort !1
>
> GCC.HLP is detected after 7 iterations. Because of high values of MVB i
> need 13 iteration steps. If HLP at that position is not found i look at
> FirstFreeBlock value at offset 8. According to other tools for HLP
> samples this value is FFFFFFFFh, whereas for many MVB samples (like
> arivideo.mvb clarkhow.mvb) this value is lower. So iteration number 13
> looks like:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>&0 search/0x371d4/s \x6c\x03
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>&0 use help-ver-date
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>&4 leshort !1
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>8 lelong !0xFFffFFff
> Windows Multimedia Viewer Book
> !:mime application/x-winhelp
> !:ext mvb
>
> The displaying part (showing version date) for HLP is done by sub
> routine help-ver-date. After check for Magic of SYSTEMHEADER 0x036C and
> major version value one the version depending on minor version is shown,
> followed by GenDate. This look like:
>
> 0 name help-ver-date
> >0 leshort 0x036C
> >>4 leshort 1 Windows
> !:mime application/winhelp
> !:ext hlp
> >>>2 leshort 0x0F 3.x
> >>>2 leshort 0x15 3.0
> >>>2 leshort 0x21 3.1
> >>>2 leshort 0x27 x.y
> >>>2 leshort 0x33 95
> >>>2 default x y.z
> >>>>2 leshort x %#x
> >>>2 leshort x help
> >>>6 ldate x \b, %s
>
> As mime type application/winhelp or application/winhlp is shown by some
> tools. But at IANA there does not exist such an officially registered
> type. So i now use application/x-winhelp. Furthermore the mentioned and
> used minor version numbers must be considered as decimal not
> hexadecimal. So sub classification is wrong. So sample like NAVW32.HLP
> is described wrong as "MS Windows 3.1 help" instead of "Windows 95 help".
>
> So the minor version part now becomes like:
> >>>2 leshort 15 3.0
> >>>2 leshort 21 3.1
> >>>2 leshort 27
> >>>2 leshort 33 95
>
> The value 27 means WMVC/MMVC media view file. That implies MVB. So
> afterwards a further sub classification level is done. So value 27
> implies MVB and other value implies HLP. So this is done by lines that
> look like:
> >>>2 leshort !27
> >>>>2 leshort x help
> !:ext hlp
> >>>2 leshort =27 Multimedia Viewer Book
> !:ext mvb
> Unfortunately one sample NOTEPLAY.MVB is still described as HLP.
>
> Luckily with information given by the other tools i found page about
> Multimedia Viewer Book on file formats archive team web site.
> That informations are expressed inside Magdir/windows by comment lines like:
> # URL: http://fileformats.archiveteam.org/wiki/Multimedia_Viewer_Book
> # Ref.: http://mark0.net/download/triddefs_xml.7z/defs/m/mvb.trid.xml
> There some specification and download links are listed.
>
> After the date in sub routine 2 byte flag value is stored which
> determinate the used compression. Afterwards HelpFileTitle is stored in
> different structures depending on minor version. Often the title
> correlates with file name.
> So this information is now shown by lines like:
> #>>>10 uleshort x \b, flags %#x
> >>>2 leshort <17
> >>>>12 string x \b, title "%s"
> >>>2 leshort >16
> #>>>>12 uleshort x \b, RecordType %u
> # DataSize size of data
> #>>>>14 uleshort x \b, DataSize %u
> >>>>12 uleshort 1
> >>>>>14 pstring/h >\0 \b, title "%s"
>
> After applying the above mentioned modifications by patch
> file-5.45-windows-hlp.diff then more missed samples (like STMMHLP.MVB
> UNIDRV.HLP) are now recognized. Furthermore more details like title
> (which often correlates with file name) is shown. This with additional
> -P bytes=30335189 option now looks like:
> CORELDRW.HLP: MS Windows 3.1 help
> , Fri Jun 26 01:09:07 1992, title
> "CorelDRAW! - Help"
> , 1551334 bytes
> GCC.ANN: MS Windows help annotation
> , 16634 bytes
> IBMAVW.HLP: MS Windows 3.0 help
> , Sun Jan 8 20:03:27 1995, title
> "IBM AntiVirus"
> , 289620 bytes
> ICCviewer.GID: MS Windows help Global Index
> , 10821 bytes
> MSGRAPH.HLP: MS Windows 3.0 help
> , Mon Jan 13 22:24:12 1992, title
> "Graph 3.0a"
> , 333011 bytes
> NAVW32.HLP: MS Windows 95 help
> , Wed Nov 10 19:48:26 1999, title
> "Norton AntiVirus for Windows 95/98"
> , 161405 bytes
> NOTEPLAY.MVB: MS Windows 3.1 help
> , Tue Feb 16 20:30:46 1993, title
> "NotePlay SE for Windows On-Line Manual"
> , 159760 bytes
> PPAINTER.MVB: MS Windows Multimedia Viewer Book
> , Sun May 16 10:25:46 2066, title
> "Picture Painter Help"
> , 19531 bytes
> STMMHLP.MVB: MS Windows Multimedia Viewer Book
> , 1818497 bytes
> UNIDRV.HLP: MS Windows 95 help
> , Tue Jul 24 17:31:10 2001, title
> "Windows"
> , 18022 bytes
> WinHlp32: MS Windows help Bookmark
> , 1228 bytes
> WinHlp32.BMK: MS Windows help Bookmark
> , 1175 bytes
> arivideo.mvb: MS Windows Multimedia Viewer Book
> , 765129 bytes
> clarkhow.mvb: MS Windows Multimedia Viewer Book
> , 19093522 bytes
> corelap.GID: MS Windows help Global Index
> , 338226 bytes
> discapp.hlp: MS Windows 3.1 help
> , Mon Aug 19 22:18:58 1996, title
> "Mwave Discriminator Help"
> , 73336 bytes
> fmt-474-signature-id-748.hlp: MS Windows Multimedia Viewer Book
> s_in.mvb: MS Windows Multimedia Viewer Book
> , Sun Oct 12 06:32:58 2064, title
> "Ski instructions"
> , 872466 bytes
> viewerht.mvb: MS Windows Multimedia Viewer Book
> , Wed Mar 5 02:45:34 2064
> , 30335189 bytes
>
> I hope my diff file can be applied in future version of file utility.
>
> With best wishes
> Jörg Jenderek
> --
> Jörg Jenderek
> <trid-v-winhelp.txt.gz><droid-winhelp.csv.gz><file-5_45-windows-hlp_diff.DEFANGED-21><file-5_45-windows-hlp_diff_sig.DEFANGED-22>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>
More information about the File
mailing list