[File] [PATCH] of Magdir/msdos Microsoft Cabinet archive missed without point char

Jörg Jenderek joerg.jen.der.ek at gmx.net
Sun Dec 25 01:21:12 UTC 2022


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

some days ago the Hewlett-Packard printer of my friend does not work
any more on Windows 10. So i downloaded from HP site all document
files and software. The printer is an HP ENVY 6000.
One package HPEasyStart-13.4.8-EN6000_51_3_4843_2_Webpack.exe
contains the printer driver and software. Just for interest i extract
the package. Some files inside has name extension CAB. When running
newest file command (msdos,v 1.163 2022/12/18) on such CAB examples
and related packed files i get an output like:

EN600x64.cab:      Microsoft Cabinet archive data,
		   many,
		   238518194 bytes, 141 files, at 0x174 +A
		   "DeviceSetupExe", iFolder 0x1 +A
		   "DeviceSetupLauncherExe",
		   39 cffolders, flags 0x4,
		   ID 12345, number 1, extra bytes 20 in head,
		   838 datablocks, 0 compression
EN600x86.cab:      Microsoft Cabinet archive data,
		   207048493 bytes, 92 files, at 0x124 +A
		   "DeviceSetupExe", iFolder 0x1 +A
		   "DeviceSetupLauncherExe",
		   29 cffolders, flags 0x4,
		   ID 12345, number 1, extra bytes 20 in head,
		   744 datablocks, 0 compression
Full_x64.cab:      Microsoft Cabinet archive data,
		   26505575 bytes, 208 files, at 0x9c +A
		   "SureSupply_hpqDTSSEXE", iFolder 0x1 +A
		   "SureSupply_hpqDTSSUIDLL",
		   12 cffolders, flags 0x4,
		   ID 12345, number 1, extra bytes 20 in head,
		   239 datablocks, 0 compression
POWERPNT.PP_:      Microsoft Cabinet archive data,
		   PowerPoint Packed and Go,
		   1765 bytes, 1 file, at 0x2c +A
		   "powerpnt.ppt",
		   number 1,
		   1 datablock, 0x1503 compression
PRES0.PPZ:         Microsoft Cabinet archive data,
		   PowerPoint Packed and Go,
		   2803 bytes, 2 files, at 0x2c +Utf
		   "Dummy slide.PPT" +Utf
		   "PLAYLIST.LST",
		   number 1,
		   1 datablock, 0x1 compression
QUOTES._:          Microsoft Cabinet archive data,
		   931 bytes, 1 file, at 0x2c +A
		   "quotes",
		   number 1,
		   1 datablock, 0x1503 compression
hpgid31v4help.cab: Microsoft Cabinet archive data,
		   many,
		   1371036 bytes, 35 files, at 0x2c +A
		   "arabic.chm" +A
		   "bulgrian.chm",
		   ID 37818, number 1,
		   51 datablocks, 0x1 compression

That looks at first glance OK, but with --extension option sometimes
??? instead of cab suffix is displayed. This looks like:

EN600x64.cab:      cab
EN600x86.cab:      ???
Full_x64.cab:      ???
POWERPNT.PP_:      ppz
PRES0.PPZ:         ppz
QUOTES._:          ???
hpgid31v4help.cab: cab

Furthermore with -i option for some samples only generic mime type
application/octet-stream instead of application/vnd.ms-cab-compressed
is shown. This looks like:

EN600x64.cab:      application/vnd.ms-cab-compressed; charset=binary
EN600x86.cab:      application/octet-stream; charset=binary
Full_x64.cab:      application/octet-stream; charset=binary
POWERPNT.PP_:      application/vnd.ms-powerpoint; charset=binary
PRES0.PPZ:         application/vnd.ms-powerpoint; charset=binary
QUOTES._:          application/octet-stream; charset=binary
hpgid31v4help.cab: application/vnd.ms-cab-compressed; charset=binary

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). There all CAB samples
are described correctly as "Microsoft Cabinet Archive"  with
application/vnd.ms-cab-compressed mime type by ark-cab.trid.xml
( See appended trid-v-cab.txt.gz).

For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/).
Here all CAB samples are described as "Windows Cabinet File" with
mime type application/vnd.ms-cab-compressed by PUID x-fmt/414.

Inside current Magdir/msdos the detection of CAB samples are start by
line like:
0	string/b MSCF\0\0\0\0	Microsoft Cabinet archive data
Then a sub classification ( file name extension and file name
extensions) is done. First a brute force for known characteristics
(member name or member suffix is done), because sometimes known
member name is not the first one. If in that branch nothing is found
then look explicit for first member name like wsusscan.cab and does
sub classification by that. If that branch does not succeed then look
for name suffix after point character like ppt\0 and use this as
further sub class level. Unfortunately the above undetected samples
does not match the above tests and so no mime type and file name
suffix is displayed. So i must add an else clause for samples where
first member name has no point character inside name. So this
inserted part look similar to other branch and looks like:

 >>>>&-1	default		x
 >>>>>28	uleshort	=1	\b, single
 !:mime	application/vnd.ms-cab-compressed
 !:ext	cab
 >>>>>28	uleshort	>1	\b, many
 !:mime	application/vnd.ms-cab-compressed
 !:ext	cab
The printer package Full_x86.cab and Full_x64.cab are matched by many
branch here. The single branch here is matched by some samples on XP
CD where original file name has no suffix ( like in NETWORKS._
PROTOCOL._ QUOTES._ SERVICES._ )

The archive member names are stored as nil terminated strings without
length information. So the search for point character in first
archive member name is maybe to generous and match point else where
like in EN600x64.cab. Hopefully then such samples are matched by at
least the default clauses. This is done by line like:
 >>>>&-1	search/255 	.

Furthermore if first member name suffix is ppt, then it assume that
this is PowerPoint Packed and Go (PowerPoint presentation *.ppt with
optional PLAYLIST.LST or ppview32.exe). This was done by part which
looks like:
 >>>>>&0	string/c	ppt\0		\b, PowerPoint Packed and Go
 !:mime	application/vnd.ms-powerpoint
 !:ext	ppz
Unfortunately this also applies to POWERPNT.PP_ found on XP_CD in I38
6
folder. This contains only a single file "powerpnt.ppt" compressed
with CAB format. So this now becomes like:
 >>>>>&0	string/c	ppt\0
 >>>>>>28 uleshort	>1		\b, PowerPoint Packed and Go
 !:mime	application/vnd.ms-powerpoint
 !:ext	ppz
 >>>>>>28 uleshort	=1		\b, one packed PowerPoint
 !:mime	application/vnd.ms-cab-compressed
 !:ext	pp_

Before the attribute flags of archive member the date and time in DOS
format is stored. That was expressed by lines like
# date stamp for file
#>10	uleshort	x		\b, date %#x
# time stamp for file
#>12	uleshort	x		\b, time %#x
In older version these values could only be displayed as hexadecimal
values. That is not so interesting for normal users. Luckily in newer
file command versions there exist now functions to show these values
in human readable form. So this now becomes like:
 >10	lemsdosdate	x		last modified %s
 >12	lemsdostime	x		%s

After applying the above mentioned modifications by patch
file-msdos-cab_point_ppz.diff then i get similar output like before.
This now looks like:

EN600x64.cab:      Microsoft Cabinet archive data,
		   many,
		   238518194 bytes, 141 files, at 0x174
		   last modified Sun, Nov 06 2021 05:45:08 +A
		   "DeviceSetupExe", iFolder 0x1
		   last modified Sun, Nov 06 2021 05:11:08 +A
		   "DeviceSetupLauncherExe",
		   39 cffolders, flags 0x4,
		   ID 12345, number 1, extra bytes 20 in head,
		   838 datablocks, 0 compression
EN600x86.cab:      Microsoft Cabinet archive data,
		   many,
		   207048493 bytes, 92 files, at 0x124
		   last modified Sun, Nov 06 2021 04:43:10 +A
		   "DeviceSetupExe", iFolder 0x1
		   last modified Sun, Nov 06 2021 04:17:42 +A
		   "DeviceSetupLauncherExe",
		   29 cffolders, flags 0x4,
		   ID 12345, number 1, extra bytes 20 in head,
		   744 datablocks, 0 compression
Full_x64.cab:      Microsoft Cabinet archive data,
		   many,
		   26505575 bytes, 208 files, at 0x9c
		   last modified Sun, Nov 06 2021 05:13:06 +A
		   "SureSupply_hpqDTSSEXE", iFolder 0x1
		   last modified Sun, Nov 06 2021 05:10:52 +A
		   "SureSupply_hpqDTSSUIDLL",
		   12 cffolders, flags 0x4,
		   ID 12345, number 1, extra bytes 20 in head,
		   239 datablocks, 0 compression
POWERPNT.PP_:      Microsoft Cabinet archive data,
		   one packed PowerPoint,
		   1765 bytes, 1 file, at 0x2c
		   last modified Sun, Jul 21 2001 18:42:44 +A
		   "powerpnt.ppt", number 1,
		   1 datablock, 0x1503 compression
PRES0.PPZ:         Microsoft Cabinet archive data,
		   PowerPoint Packed and Go,
		   2803 bytes, 2 files, at 0x2c
		   last modified Sun, Jan 16 2006 18:00:52 +Utf
		   "Dummy slide.PPT"
		   last modified Sun, Jan 16 2006 18:00:52 +Utf
		   "PLAYLIST.LST", number 1,
		   1 datablock, 0x1 compression
QUOTES._:          Microsoft Cabinet archive data,
		   single,
		   931 bytes, 1 file, at 0x2c
		   last modified Sun, Jul 28 2001 15:08:06 +A
		   "quotes", number 1,
		   1 datablock, 0x1503 compression
hpgid31v4help.cab: Microsoft Cabinet archive data,
		   many,
		   1371036 bytes, 35 files, at 0x2c
		   last modified Sun, Oct 01 2014 11:47:24 +A
		   "arabic.chm"
		   last modified Sun, Oct 01 2014 11:47:24 +A
		   "bulgrian.chm", ID 37818, number 1,
		   51 datablocks, 0x1 compression

With --extension option for inspected examples the correct file
name extensions are now shown like:

EN600x64.cab:      cab
EN600x86.cab:      cab
Full_x64.cab:      cab
POWERPNT.PP_:      pp_
PRES0.PPZ:         ppz
QUOTES._:          _
hpgid31v4help.cab: cab

With -i option for inspected examples the correct file mime types are
now shown like:

EN600x64.cab:      application/vnd.ms-cab-compressed; charset=binary
EN600x86.cab:      application/vnd.ms-cab-compressed; charset=binary
Full_x64.cab:      application/vnd.ms-cab-compressed; charset=binary
POWERPNT.PP_:      application/vnd.ms-cab-compressed; charset=binary
PRES0.PPZ:         application/vnd.ms-powerpoint; charset=binary
QUOTES._:          application/vnd.ms-cab-compressed; charset=binary
hpgid31v4help.cab: application/vnd.ms-cab-compressed; charset=binary

I hope my diff file can be applied in future version of
file utility.

With best wishes
Jörg Jenderek
- --
Jörg Jenderek




-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCY6eliAAKCRCv8rHJQhrU
1iRcAKCN2fJ58vd/eOPCK57vIzfspNVfyACg3GKW2d1dEpHkD12tTuEJwYoblqc=
=fsMh
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-cab.txt.gz
Type: application/x-gzip
Size: 495 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20221225/1ed5c49f/attachment-0001.bin>
-------------- next part --------------
--- file-master/magic/Magdir/msdos.old	2022-12-24 13:52:20.930309400 +0100
+++ file-master/magic/Magdir/msdos	2022-12-25 02:08:24.124203600 +0100
@@ -1876,2 +1876,3 @@
 # look at point character of 1st archive member name for file name extension
+# GRR: search range is maybe too large and match point else where like in EN600x64.cab!
 >>>>&-1	search/255 	.
@@ -1880,3 +1881,4 @@
 # packs optional files, a PowerPoint presentation *.ppt with optional PLAYLIST.LST to CAB
->>>>>&0	string/c	ppt\0		\b, PowerPoint Packed and Go
+>>>>>&0	string/c	ppt\0
+>>>>>>28 uleshort	>1		\b, PowerPoint Packed and Go
 !:mime	application/vnd.ms-powerpoint
@@ -1884,2 +1886,6 @@
 !:ext	ppz
+# or POWERPNT.PPT packed as POWERPNT.PP_ found on Windows 2000,XP setup CD in directory i386
+>>>>>>28 uleshort	=1		\b, one packed PowerPoint
+!:mime	application/vnd.ms-cab-compressed
+!:ext	pp_
 # https://msdn.microsoft.com/en-us/library/windows/desktop/bb773190(v=vs.85).aspx
@@ -1931,2 +1937,12 @@
 !:ext	cab
+# first archive name without point character
+>>>>&-1	default		x
+>>>>>28	uleshort	=1	\b, single
+!:mime	application/vnd.ms-cab-compressed
+# on XP_CD\I386\ like: NETWORKS._ PROTOCOL._ QUOTES._ SERVICES._
+!:ext	_
+>>>>>28	uleshort	>1	\b, many
+!:mime	application/vnd.ms-cab-compressed
+# like: HP Envy 6000 printer driver packages Full_x86.cab Full_x64.cab
+!:ext	cab
 # TODO: additional extensions like
@@ -2028,5 +2044,5 @@
 # date stamp for file
-#>10	uleshort	x		\b, date %#x
+>10	lemsdosdate	x		last modified %s
 # time stamp for file
-#>12	uleshort	x		\b, time %#x
+>12	lemsdostime	x		%s
 # attribs is attribute flags for file
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-msdos-cab_point_ppz.diff.sig
Type: application/octet-stream
Size: 1029 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20221225/1ed5c49f/attachment-0001.obj>


More information about the File mailing list