[File] [PATCH] Magdir/windows MS Windows help corrections

Christos Zoulas christos at zoulas.com
Mon Jun 24 22:46:16 UTC 2024


Committed, thanks!

christos

> On Jun 24, 2024, at 3:32 PM, Jörg Jenderek (GMX) <joerg.jen.der.ek at gmx.net> wrote:
> 
> Hello,
> 
> some days ago i send patch for gfxboot compiled html help files. These
> have file name suffix. This suffix is also used for MS Windows help files.
> 
> When running file command version 5.45 on such Windows help files and
> sub class variants i get an output like:
> 
> CORELDRW.HLP:                 MS Windows help Bookmark, 1551334 bytes
> GCC.ANN:                      MS Windows help annotation, 16634 bytes
> IBMAVW.HLP:                   MS Windows 3.x help
> 			      , Sun Jan  8 20:03:27 1995, 289620 bytes
> ICCviewer.GID:                MS Windows help Bookmark, 10821 bytes
> MSGRAPH.HLP:                  MS Windows help Bookmark, 333011 bytes
> NAVW32.HLP:                   MS Windows 3.1 help
> 			      , Wed Nov 10 19:48:26 1999, 161405 bytes
> NOTEPLAY.MVB:                 MS Windows 3.0 help
> 			      , Tue Feb 16 20:30:46 1993, 159760 bytes
> PPAINTER.MVB:                 MS Windows y.z 0x1b help
> 			      , Sun May 16 10:25:46 2066, 19531 bytes
> STMMHLP.MVB:                  data
> UNIDRV.HLP:                   MS, 18022 bytes
> WinHlp32:                     MS Windows help Bookmark, 1228 bytes
> WinHlp32.BMK:                 MS Windows help Bookmark, 1175 bytes
> arivideo.mvb:                 MS Windows help Bookmark, 765129 bytes
> clarkhow.mvb:                 data
> corelap.GID:                  MS Windows help Bookmark, 338226 bytes
> discapp.hlp:                  MS Windows 3.0 help
> 			      , Mon Aug 19 22:18:58 1996, 73336 bytes
> fmt-474-signature-id-748.hlp: data
> s_in.mvb:                     MS Windows help Bookmark, 872466 bytes
> viewerht.mvb:                 data
> 
> Because of some misidentification with --extension option also wrong
> suffix are displayed. This looks like:
> CORELDRW.HLP:                 bmk
> GCC.ANN:                      ann
> IBMAVW.HLP:                   hlp
> ICCviewer.GID:                bmk
> MSGRAPH.HLP:                  bmk
> NAVW32.HLP:                   hlp
> NOTEPLAY.MVB:                 hlp
> PPAINTER.MVB:                 hlp
> STMMHLP.MVB:                  ???
> UNIDRV.HLP:                   ???
> WinHlp32:                     bmk
> WinHlp32.BMK:                 bmk
> arivideo.mvb:                 bmk
> clarkhow.mvb:                 ???
> corelap.GID:                  bmk
> discapp.hlp:                  hlp
> fmt-474-signature-id-748.hlp: ???
> s_in.mvb:                     bmk
> viewerht.mvb:                 ???
> 
> Furthermore with -i option for most samples application/x-winhelp or
> application/winhelp is shown.
> 
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). This identifies all
> such examples with lowest priority as "Multimedia Viewer Book" with MVB
> file name suffix and generic application/octet-stream mime type by
> mvb.trid.xml. This is triggered because Windows help and related
> variants (like bookmark, annotation and global help index) start with
> byte sequence 3F5F0300. With higher priority many samples are described
> as "Windows HELP File" with HLP name suffix and no mime type by
> hlp.trid.xml. Here additional at offset 7 is checked for byte sequence
> 00FFFFFFFFh. Samples like corelap.GID are described as "Windows Help
> index" with GID suffix and application/x-winhelp by gid.trid.xml.
> Samples like GCC.ANN are described as "Windows Help Annotation" with ANN
> suffix and mime type application/x-winhelp by ann-winhelp.trid.xml. This
> software list the used file name extension and with -v option often the
> related URL pointing to used file format information (See appended
> trid-v-winhelp.txt.gz).
> 
> For comparison reason i also run the file format identification utility
> DROID (See https://sourceforge.net/projects/droid/). Here the HLP are
> recognized correctly as "Windows Help File" with HLP suffix and without
> mime type by PUID fmt/474. Compared with TrID it checks for byte
> sequence FFFFFFFF at offset 8. Some MVB samples (like PPAINTER.MVB) are
> correctly recognized as "Multimedia Viewer Book" with MVB suffix and
> without mime type by PUID fmt/1800. This checks for SYSTEMHEADER pattern
> 0x6C03, minor version  0x1B00 (WMVC/MMVC media view) followed by major
> version 0x0100. GID, ANN and BMK samples are not recognized or
> misinterpreted as HLP (See appended droid-winhelp.csv.gz).
> 
> On Linux according to shared MIME-info database the samples are called
> "WinHelp help file". The recognition happens by byte sequence 0x00035f3f
> at offset 0. Here application/winhlp is shown as mime type. Here only
> hlp is listed as suffix. That information can be seen in
> freedesktop.org.xml.in source found for example on gitlab.freedesktop.org.
> 
> The examples are recognized by first check (as done by other tools)
> inside Magdir/windows. This looks like:
> 0	lelong		0x00035f3f
> In next step it checks for system file header magic 0x293B at
> DirectoryStart+9 by line that looks like:
> >(4.l+9)	uleshort	0x293B		MS
> This fails for DROID sample fmt-474-signature-id-748.hlp and some MVB
> samples (like STMMHLP.MVB clarkhow.mvb viewerht.mvb). The MVB samples
> are Multimedia Viewer Books. Therefore such samples contain many
> graphics. This implies a "big" file size and this often leads to "high"
> DirectoryStart offset of FILEHEADER stored at offset four. Therefore the
> above line is not executed because offset is beyond the standard limits.
> This can be overcome when running for example file command with
> additional "-P bytes=30335189" option. Then many of the MVB examples are
> recognized but described wrong as "MS Windows help Bookmark". This
> happens when sub classification as ANN, GID and HLP fails. The
> assumption is that the sample then is a BMK (bookmark).
> So damaged samples like STMMHLP.MVB or samples with "high" offsets are
> now handled by additional branch. That looks like:
> >(4.l+9) uleshort !0x293B MS Windows Multimedia Viewer Book
> #!:mime	application/octet-stream
> !:ext	mvb
> >>12	lelong	x	 (damaged or use higher '-P bytes' option)
> Unfortunately the above line is not executed. Maybe this is a bug in
> file command!
> 
> If test for Windows help annotation fails then the check for GID is done
> by line that looks like:
> >>>(4.l+0x65)	string		=|Pete	Windows help Global Index
> Unfortunately this Pete phrase occurs in few samples like corelap.GID at
> little higher offset. So this above line now becomes like:
> >>>(4.l+0x65)	search/26	|Pete	Windows help Global Index
> 
> The sub classification as HLP is done by looking for SYSTEMHEADER
> pattern 0x6C03 and displaying part is done by sub routine help-ver-date.
> If check for major version one fails then repeat this step seven times.
> This starts like:
> >>>>16			search/0x49AF/s	\x6c\x03
> >>>>>&0			use 		help-ver-date
> >>>>>&4			leshort		!1
> Because of "high" file sizes and offset of MVB the above search range
> for samples like viewerht.mvb must be raised. So the above first
> iteration now becomes like:
> >>>>16			search/0x1bbc370/s \x6c\x03
> >>>>>&0			use 		help-ver-date
> >>>>>&4			leshort		!1
> Then of course the search range in next iteration steps must be raised.
> So second iteration step at the moment looks like:
> >>>>>>&0		search/0x69AF/s	\x6c\x03
> >>>>>>>&0		use 		help-ver-date
> >>>>>>>&4		leshort		!1
> Furthermore in sample like viewerht.mvb the byte 0x6C03 occur in very
> short steps. So for that samples the second iteration step now becomes like:
> >>>>>>&-2		search/0x1c4b6f0/s \x6c\x03
> >>>>>>>&0		use 		help-ver-date
> >>>>>>>&4		leshort		!1
> 
> GCC.HLP is detected after 7 iterations. Because of high values of MVB i
> need 13 iteration steps. If HLP at that position is not found i look at
> FirstFreeBlock value at offset 8. According to other tools for HLP
> samples this value is FFFFFFFFh, whereas for many MVB samples (like
> arivideo.mvb clarkhow.mvb) this value is lower. So iteration number 13
> looks like:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>&0	search/0x371d4/s \x6c\x03
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>&0	use 	help-ver-date
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>&4	leshort	!1
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>8	lelong	!0xFFffFFff
> 					Windows Multimedia Viewer Book
> !:mime	application/x-winhelp
> !:ext	mvb
> 
> The displaying part (showing version date) for HLP is done by sub
> routine help-ver-date. After check for Magic of SYSTEMHEADER 0x036C and
> major version value one the version depending on minor version is shown,
> followed by GenDate. This look like:
> 
> 0	name				help-ver-date
> >0	leshort		0x036C
> >>4	leshort		1		Windows
> !:mime	application/winhelp
> !:ext	hlp
> >>>2	leshort		0x0F		3.x
> >>>2	leshort		0x15		3.0
> >>>2	leshort		0x21		3.1
> >>>2	leshort		0x27		x.y
> >>>2	leshort		0x33		95
> >>>2	default		x		y.z
> >>>>2	leshort		x		%#x
> >>>2	leshort		x		help
> >>>6	ldate		x		\b, %s
> 
> As mime type application/winhelp or application/winhlp is shown by some
> tools. But at IANA there does not exist such an officially registered
> type. So i now use application/x-winhelp. Furthermore the mentioned and
> used minor version numbers must be considered as decimal not
> hexadecimal. So sub classification is wrong. So sample like NAVW32.HLP
> is described wrong as "MS Windows 3.1 help" instead of "Windows 95 help".
> 
> So the minor version part now becomes like:
> >>>2	leshort		15		3.0
> >>>2	leshort		21		3.1
> >>>2	leshort		27
> >>>2	leshort		33		95
> 
> The value 27 means WMVC/MMVC media view file. That implies MVB. So
> afterwards a further sub classification level is done. So value 27
> implies MVB and other value implies HLP. So this is done by lines that
> look like:
> >>>2	leshort		!27
> >>>>2	leshort		x		help
> !:ext	hlp
> >>>2	leshort		=27		Multimedia Viewer Book
> !:ext	mvb
> Unfortunately one sample NOTEPLAY.MVB is still described as HLP.
> 
> Luckily with information given by the other tools i found page about
> Multimedia Viewer Book on file formats archive team web site.
> That informations are expressed inside Magdir/windows by comment lines like:
> # URL:	http://fileformats.archiveteam.org/wiki/Multimedia_Viewer_Book
> # Ref.:	http://mark0.net/download/triddefs_xml.7z/defs/m/mvb.trid.xml
> There some specification and download links are listed.
> 
> After the date in sub routine 2 byte flag value is stored which
> determinate the used compression. Afterwards HelpFileTitle is stored in
> different structures depending on minor version. Often the title
> correlates with file name.
> So this information is now shown by lines like:
> #>>>10	uleshort		x	\b, flags %#x
> >>>2	leshort		<17
> >>>>12	string		x		\b, title "%s"
> >>>2	leshort		>16
> #>>>>12	uleshort	x		\b, RecordType %u
> # DataSize size of data
> #>>>>14	uleshort	x		\b, DataSize %u
> >>>>12	uleshort	1
> >>>>>14	pstring/h	>\0		\b, title "%s"
> 
> After applying the above mentioned modifications by patch
> file-5.45-windows-hlp.diff then more missed samples (like STMMHLP.MVB
> UNIDRV.HLP) are now recognized. Furthermore more details like title
> (which often correlates with file name) is shown. This with additional
> -P bytes=30335189 option now looks like:
> CORELDRW.HLP:                 MS Windows 3.1 help
> 			      , Fri Jun 26 01:09:07 1992, title
> 			      "CorelDRAW! - Help"
> 			      , 1551334 bytes
> GCC.ANN:                      MS Windows help annotation
> 			      , 16634 bytes
> IBMAVW.HLP:                   MS Windows 3.0 help
> 			      , Sun Jan  8 20:03:27 1995, title
> 			      "IBM AntiVirus"
> 			      , 289620 bytes
> ICCviewer.GID:                MS Windows help Global Index
> 			      , 10821 bytes
> MSGRAPH.HLP:                  MS Windows 3.0 help
> 			      , Mon Jan 13 22:24:12 1992, title
> 			      "Graph 3.0a"
> 			      , 333011 bytes
> NAVW32.HLP:                   MS Windows 95 help
> 			      , Wed Nov 10 19:48:26 1999, title
> 			      "Norton AntiVirus for Windows 95/98"
> 			      , 161405 bytes
> NOTEPLAY.MVB:                 MS Windows 3.1 help
> 			      , Tue Feb 16 20:30:46 1993, title
> 			      "NotePlay SE for Windows On-Line Manual"
> 			      , 159760 bytes
> PPAINTER.MVB:                 MS Windows Multimedia Viewer Book
> 			      , Sun May 16 10:25:46 2066, title
> 			      "Picture Painter Help"
> 			      , 19531 bytes
> STMMHLP.MVB:                  MS Windows Multimedia Viewer Book
> 			      , 1818497 bytes
> UNIDRV.HLP:                   MS Windows 95 help
> 			      , Tue Jul 24 17:31:10 2001, title
> 			      "Windows"
> 			      , 18022 bytes
> WinHlp32:                     MS Windows help Bookmark
> 			      , 1228 bytes
> WinHlp32.BMK:                 MS Windows help Bookmark
> 			      , 1175 bytes
> arivideo.mvb:                 MS Windows Multimedia Viewer Book
> 			      , 765129 bytes
> clarkhow.mvb:                 MS Windows Multimedia Viewer Book
> 			      , 19093522 bytes
> corelap.GID:                  MS Windows help Global Index
> 			      , 338226 bytes
> discapp.hlp:                  MS Windows 3.1 help
> 			      , Mon Aug 19 22:18:58 1996, title
> 			      "Mwave Discriminator Help"
> 			      , 73336 bytes
> fmt-474-signature-id-748.hlp: MS Windows Multimedia Viewer Book
> s_in.mvb:                     MS Windows Multimedia Viewer Book
> 			      , Sun Oct 12 06:32:58 2064, title
> 			      "Ski instructions"
> 			      , 872466 bytes
> viewerht.mvb:                 MS Windows Multimedia Viewer Book
> 			      , Wed Mar  5 02:45:34 2064
> 			      , 30335189 bytes
> 
> I hope my diff file can be applied in future version of file utility.
> 
> With best wishes
> Jörg Jenderek
> --
> Jörg Jenderek
> <trid-v-winhelp.txt.gz><droid-winhelp.csv.gz><file-5_45-windows-hlp_diff.DEFANGED-21><file-5_45-windows-hlp_diff_sig.DEFANGED-22>-- 
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>



More information about the File mailing list