[File] [PATCH] of Magdir/msdos Microsoft Cabinet archive missed without point char

Christos Zoulas christos at zoulas.com
Mon Dec 26 17:23:18 UTC 2022


Committed, thanks!

christos

> On Dec 24, 2022, at 8:21 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hello,
> 
> some days ago the Hewlett-Packard printer of my friend does not work
> any more on Windows 10. So i downloaded from HP site all document
> files and software. The printer is an HP ENVY 6000.
> One package HPEasyStart-13.4.8-EN6000_51_3_4843_2_Webpack.exe
> contains the printer driver and software. Just for interest i extract
> the package. Some files inside has name extension CAB. When running
> newest file command (msdos,v 1.163 2022/12/18) on such CAB examples
> and related packed files i get an output like:
> 
> EN600x64.cab:      Microsoft Cabinet archive data,
> 		   many,
> 		   238518194 bytes, 141 files, at 0x174 +A
> 		   "DeviceSetupExe", iFolder 0x1 +A
> 		   "DeviceSetupLauncherExe",
> 		   39 cffolders, flags 0x4,
> 		   ID 12345, number 1, extra bytes 20 in head,
> 		   838 datablocks, 0 compression
> EN600x86.cab:      Microsoft Cabinet archive data,
> 		   207048493 bytes, 92 files, at 0x124 +A
> 		   "DeviceSetupExe", iFolder 0x1 +A
> 		   "DeviceSetupLauncherExe",
> 		   29 cffolders, flags 0x4,
> 		   ID 12345, number 1, extra bytes 20 in head,
> 		   744 datablocks, 0 compression
> Full_x64.cab:      Microsoft Cabinet archive data,
> 		   26505575 bytes, 208 files, at 0x9c +A
> 		   "SureSupply_hpqDTSSEXE", iFolder 0x1 +A
> 		   "SureSupply_hpqDTSSUIDLL",
> 		   12 cffolders, flags 0x4,
> 		   ID 12345, number 1, extra bytes 20 in head,
> 		   239 datablocks, 0 compression
> POWERPNT.PP_:      Microsoft Cabinet archive data,
> 		   PowerPoint Packed and Go,
> 		   1765 bytes, 1 file, at 0x2c +A
> 		   "powerpnt.ppt",
> 		   number 1,
> 		   1 datablock, 0x1503 compression
> PRES0.PPZ:         Microsoft Cabinet archive data,
> 		   PowerPoint Packed and Go,
> 		   2803 bytes, 2 files, at 0x2c +Utf
> 		   "Dummy slide.PPT" +Utf
> 		   "PLAYLIST.LST",
> 		   number 1,
> 		   1 datablock, 0x1 compression
> QUOTES._:          Microsoft Cabinet archive data,
> 		   931 bytes, 1 file, at 0x2c +A
> 		   "quotes",
> 		   number 1,
> 		   1 datablock, 0x1503 compression
> hpgid31v4help.cab: Microsoft Cabinet archive data,
> 		   many,
> 		   1371036 bytes, 35 files, at 0x2c +A
> 		   "arabic.chm" +A
> 		   "bulgrian.chm",
> 		   ID 37818, number 1,
> 		   51 datablocks, 0x1 compression
> 
> That looks at first glance OK, but with --extension option sometimes
> ??? instead of cab suffix is displayed. This looks like:
> 
> EN600x64.cab:      cab
> EN600x86.cab:      ???
> Full_x64.cab:      ???
> POWERPNT.PP_:      ppz
> PRES0.PPZ:         ppz
> QUOTES._:          ???
> hpgid31v4help.cab: cab
> 
> Furthermore with -i option for some samples only generic mime type
> application/octet-stream instead of application/vnd.ms-cab-compressed
> is shown. This looks like:
> 
> EN600x64.cab:      application/vnd.ms-cab-compressed; charset=binary
> EN600x86.cab:      application/octet-stream; charset=binary
> Full_x64.cab:      application/octet-stream; charset=binary
> POWERPNT.PP_:      application/vnd.ms-powerpoint; charset=binary
> PRES0.PPZ:         application/vnd.ms-powerpoint; charset=binary
> QUOTES._:          application/octet-stream; charset=binary
> hpgid31v4help.cab: application/vnd.ms-cab-compressed; charset=binary
> 
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). There all CAB samples
> are described correctly as "Microsoft Cabinet Archive"  with
> application/vnd.ms-cab-compressed mime type by ark-cab.trid.xml
> ( See appended trid-v-cab.txt.gz).
> 
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/).
> Here all CAB samples are described as "Windows Cabinet File" with
> mime type application/vnd.ms-cab-compressed by PUID x-fmt/414.
> 
> Inside current Magdir/msdos the detection of CAB samples are start by
> line like:
> 0	string/b MSCF\0\0\0\0	Microsoft Cabinet archive data
> Then a sub classification ( file name extension and file name
> extensions) is done. First a brute force for known characteristics
> (member name or member suffix is done), because sometimes known
> member name is not the first one. If in that branch nothing is found
> then look explicit for first member name like wsusscan.cab and does
> sub classification by that. If that branch does not succeed then look
> for name suffix after point character like ppt\0 and use this as
> further sub class level. Unfortunately the above undetected samples
> does not match the above tests and so no mime type and file name
> suffix is displayed. So i must add an else clause for samples where
> first member name has no point character inside name. So this
> inserted part look similar to other branch and looks like:
> 
>>>>> &-1	default		x
>>>>>> 28	uleshort	=1	\b, single
> !:mime	application/vnd.ms-cab-compressed
> !:ext	cab
>>>>>> 28	uleshort	>1	\b, many
> !:mime	application/vnd.ms-cab-compressed
> !:ext	cab
> The printer package Full_x86.cab and Full_x64.cab are matched by many
> branch here. The single branch here is matched by some samples on XP
> CD where original file name has no suffix ( like in NETWORKS._
> PROTOCOL._ QUOTES._ SERVICES._ )
> 
> The archive member names are stored as nil terminated strings without
> length information. So the search for point character in first
> archive member name is maybe to generous and match point else where
> like in EN600x64.cab. Hopefully then such samples are matched by at
> least the default clauses. This is done by line like:
>>>>> &-1	search/255 	.
> 
> Furthermore if first member name suffix is ppt, then it assume that
> this is PowerPoint Packed and Go (PowerPoint presentation *.ppt with
> optional PLAYLIST.LST or ppview32.exe). This was done by part which
> looks like:
>>>>>> &0	string/c	ppt\0		\b, PowerPoint Packed and Go
> !:mime	application/vnd.ms-powerpoint
> !:ext	ppz
> Unfortunately this also applies to POWERPNT.PP_ found on XP_CD in I38
> 6
> folder. This contains only a single file "powerpnt.ppt" compressed
> with CAB format. So this now becomes like:
>>>>>> &0	string/c	ppt\0
>>>>>>> 28 uleshort	>1		\b, PowerPoint Packed and Go
> !:mime	application/vnd.ms-powerpoint
> !:ext	ppz
>>>>>>> 28 uleshort	=1		\b, one packed PowerPoint
> !:mime	application/vnd.ms-cab-compressed
> !:ext	pp_
> 
> Before the attribute flags of archive member the date and time in DOS
> format is stored. That was expressed by lines like
> # date stamp for file
> #>10	uleshort	x		\b, date %#x
> # time stamp for file
> #>12	uleshort	x		\b, time %#x
> In older version these values could only be displayed as hexadecimal
> values. That is not so interesting for normal users. Luckily in newer
> file command versions there exist now functions to show these values
> in human readable form. So this now becomes like:
>> 10	lemsdosdate	x		last modified %s
>> 12	lemsdostime	x		%s
> 
> After applying the above mentioned modifications by patch
> file-msdos-cab_point_ppz.diff then i get similar output like before.
> This now looks like:
> 
> EN600x64.cab:      Microsoft Cabinet archive data,
> 		   many,
> 		   238518194 bytes, 141 files, at 0x174
> 		   last modified Sun, Nov 06 2021 05:45:08 +A
> 		   "DeviceSetupExe", iFolder 0x1
> 		   last modified Sun, Nov 06 2021 05:11:08 +A
> 		   "DeviceSetupLauncherExe",
> 		   39 cffolders, flags 0x4,
> 		   ID 12345, number 1, extra bytes 20 in head,
> 		   838 datablocks, 0 compression
> EN600x86.cab:      Microsoft Cabinet archive data,
> 		   many,
> 		   207048493 bytes, 92 files, at 0x124
> 		   last modified Sun, Nov 06 2021 04:43:10 +A
> 		   "DeviceSetupExe", iFolder 0x1
> 		   last modified Sun, Nov 06 2021 04:17:42 +A
> 		   "DeviceSetupLauncherExe",
> 		   29 cffolders, flags 0x4,
> 		   ID 12345, number 1, extra bytes 20 in head,
> 		   744 datablocks, 0 compression
> Full_x64.cab:      Microsoft Cabinet archive data,
> 		   many,
> 		   26505575 bytes, 208 files, at 0x9c
> 		   last modified Sun, Nov 06 2021 05:13:06 +A
> 		   "SureSupply_hpqDTSSEXE", iFolder 0x1
> 		   last modified Sun, Nov 06 2021 05:10:52 +A
> 		   "SureSupply_hpqDTSSUIDLL",
> 		   12 cffolders, flags 0x4,
> 		   ID 12345, number 1, extra bytes 20 in head,
> 		   239 datablocks, 0 compression
> POWERPNT.PP_:      Microsoft Cabinet archive data,
> 		   one packed PowerPoint,
> 		   1765 bytes, 1 file, at 0x2c
> 		   last modified Sun, Jul 21 2001 18:42:44 +A
> 		   "powerpnt.ppt", number 1,
> 		   1 datablock, 0x1503 compression
> PRES0.PPZ:         Microsoft Cabinet archive data,
> 		   PowerPoint Packed and Go,
> 		   2803 bytes, 2 files, at 0x2c
> 		   last modified Sun, Jan 16 2006 18:00:52 +Utf
> 		   "Dummy slide.PPT"
> 		   last modified Sun, Jan 16 2006 18:00:52 +Utf
> 		   "PLAYLIST.LST", number 1,
> 		   1 datablock, 0x1 compression
> QUOTES._:          Microsoft Cabinet archive data,
> 		   single,
> 		   931 bytes, 1 file, at 0x2c
> 		   last modified Sun, Jul 28 2001 15:08:06 +A
> 		   "quotes", number 1,
> 		   1 datablock, 0x1503 compression
> hpgid31v4help.cab: Microsoft Cabinet archive data,
> 		   many,
> 		   1371036 bytes, 35 files, at 0x2c
> 		   last modified Sun, Oct 01 2014 11:47:24 +A
> 		   "arabic.chm"
> 		   last modified Sun, Oct 01 2014 11:47:24 +A
> 		   "bulgrian.chm", ID 37818, number 1,
> 		   51 datablocks, 0x1 compression
> 
> With --extension option for inspected examples the correct file
> name extensions are now shown like:
> 
> EN600x64.cab:      cab
> EN600x86.cab:      cab
> Full_x64.cab:      cab
> POWERPNT.PP_:      pp_
> PRES0.PPZ:         ppz
> QUOTES._:          _
> hpgid31v4help.cab: cab
> 
> With -i option for inspected examples the correct file mime types are
> now shown like:
> 
> EN600x64.cab:      application/vnd.ms-cab-compressed; charset=binary
> EN600x86.cab:      application/vnd.ms-cab-compressed; charset=binary
> Full_x64.cab:      application/vnd.ms-cab-compressed; charset=binary
> POWERPNT.PP_:      application/vnd.ms-cab-compressed; charset=binary
> PRES0.PPZ:         application/vnd.ms-powerpoint; charset=binary
> QUOTES._:          application/vnd.ms-cab-compressed; charset=binary
> hpgid31v4help.cab: application/vnd.ms-cab-compressed; charset=binary
> 
> I hope my diff file can be applied in future version of
> file utility.
> 
> With best wishes
> Jörg Jenderek
> - --
> Jörg Jenderek
> 
> 
> 
> 
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
> 
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCY6eliAAKCRCv8rHJQhrU
> 1iRcAKCN2fJ58vd/eOPCK57vIzfspNVfyACg3GKW2d1dEpHkD12tTuEJwYoblqc=
> =fsMh
> -----END PGP SIGNATURE-----
> <trid-v-cab.txt.gz><file-msdos-cab_point_ppz_diff.DEFANGED-38><file-msdos-cab_point_ppz_diff_sig.DEFANGED-39>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20221226/5866fca4/attachment.asc>


More information about the File mailing list