[File] [PATCH] of Magdir/msdos Microsoft Cabinet archive missed without point char
Christos Zoulas
christos at zoulas.com
Mon Dec 26 17:23:18 UTC 2022
Committed, thanks!
christos
> On Dec 24, 2022, at 8:21 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello,
>
> some days ago the Hewlett-Packard printer of my friend does not work
> any more on Windows 10. So i downloaded from HP site all document
> files and software. The printer is an HP ENVY 6000.
> One package HPEasyStart-13.4.8-EN6000_51_3_4843_2_Webpack.exe
> contains the printer driver and software. Just for interest i extract
> the package. Some files inside has name extension CAB. When running
> newest file command (msdos,v 1.163 2022/12/18) on such CAB examples
> and related packed files i get an output like:
>
> EN600x64.cab: Microsoft Cabinet archive data,
> many,
> 238518194 bytes, 141 files, at 0x174 +A
> "DeviceSetupExe", iFolder 0x1 +A
> "DeviceSetupLauncherExe",
> 39 cffolders, flags 0x4,
> ID 12345, number 1, extra bytes 20 in head,
> 838 datablocks, 0 compression
> EN600x86.cab: Microsoft Cabinet archive data,
> 207048493 bytes, 92 files, at 0x124 +A
> "DeviceSetupExe", iFolder 0x1 +A
> "DeviceSetupLauncherExe",
> 29 cffolders, flags 0x4,
> ID 12345, number 1, extra bytes 20 in head,
> 744 datablocks, 0 compression
> Full_x64.cab: Microsoft Cabinet archive data,
> 26505575 bytes, 208 files, at 0x9c +A
> "SureSupply_hpqDTSSEXE", iFolder 0x1 +A
> "SureSupply_hpqDTSSUIDLL",
> 12 cffolders, flags 0x4,
> ID 12345, number 1, extra bytes 20 in head,
> 239 datablocks, 0 compression
> POWERPNT.PP_: Microsoft Cabinet archive data,
> PowerPoint Packed and Go,
> 1765 bytes, 1 file, at 0x2c +A
> "powerpnt.ppt",
> number 1,
> 1 datablock, 0x1503 compression
> PRES0.PPZ: Microsoft Cabinet archive data,
> PowerPoint Packed and Go,
> 2803 bytes, 2 files, at 0x2c +Utf
> "Dummy slide.PPT" +Utf
> "PLAYLIST.LST",
> number 1,
> 1 datablock, 0x1 compression
> QUOTES._: Microsoft Cabinet archive data,
> 931 bytes, 1 file, at 0x2c +A
> "quotes",
> number 1,
> 1 datablock, 0x1503 compression
> hpgid31v4help.cab: Microsoft Cabinet archive data,
> many,
> 1371036 bytes, 35 files, at 0x2c +A
> "arabic.chm" +A
> "bulgrian.chm",
> ID 37818, number 1,
> 51 datablocks, 0x1 compression
>
> That looks at first glance OK, but with --extension option sometimes
> ??? instead of cab suffix is displayed. This looks like:
>
> EN600x64.cab: cab
> EN600x86.cab: ???
> Full_x64.cab: ???
> POWERPNT.PP_: ppz
> PRES0.PPZ: ppz
> QUOTES._: ???
> hpgid31v4help.cab: cab
>
> Furthermore with -i option for some samples only generic mime type
> application/octet-stream instead of application/vnd.ms-cab-compressed
> is shown. This looks like:
>
> EN600x64.cab: application/vnd.ms-cab-compressed; charset=binary
> EN600x86.cab: application/octet-stream; charset=binary
> Full_x64.cab: application/octet-stream; charset=binary
> POWERPNT.PP_: application/vnd.ms-powerpoint; charset=binary
> PRES0.PPZ: application/vnd.ms-powerpoint; charset=binary
> QUOTES._: application/octet-stream; charset=binary
> hpgid31v4help.cab: application/vnd.ms-cab-compressed; charset=binary
>
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). There all CAB samples
> are described correctly as "Microsoft Cabinet Archive" with
> application/vnd.ms-cab-compressed mime type by ark-cab.trid.xml
> ( See appended trid-v-cab.txt.gz).
>
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/).
> Here all CAB samples are described as "Windows Cabinet File" with
> mime type application/vnd.ms-cab-compressed by PUID x-fmt/414.
>
> Inside current Magdir/msdos the detection of CAB samples are start by
> line like:
> 0 string/b MSCF\0\0\0\0 Microsoft Cabinet archive data
> Then a sub classification ( file name extension and file name
> extensions) is done. First a brute force for known characteristics
> (member name or member suffix is done), because sometimes known
> member name is not the first one. If in that branch nothing is found
> then look explicit for first member name like wsusscan.cab and does
> sub classification by that. If that branch does not succeed then look
> for name suffix after point character like ppt\0 and use this as
> further sub class level. Unfortunately the above undetected samples
> does not match the above tests and so no mime type and file name
> suffix is displayed. So i must add an else clause for samples where
> first member name has no point character inside name. So this
> inserted part look similar to other branch and looks like:
>
>>>>> &-1 default x
>>>>>> 28 uleshort =1 \b, single
> !:mime application/vnd.ms-cab-compressed
> !:ext cab
>>>>>> 28 uleshort >1 \b, many
> !:mime application/vnd.ms-cab-compressed
> !:ext cab
> The printer package Full_x86.cab and Full_x64.cab are matched by many
> branch here. The single branch here is matched by some samples on XP
> CD where original file name has no suffix ( like in NETWORKS._
> PROTOCOL._ QUOTES._ SERVICES._ )
>
> The archive member names are stored as nil terminated strings without
> length information. So the search for point character in first
> archive member name is maybe to generous and match point else where
> like in EN600x64.cab. Hopefully then such samples are matched by at
> least the default clauses. This is done by line like:
>>>>> &-1 search/255 .
>
> Furthermore if first member name suffix is ppt, then it assume that
> this is PowerPoint Packed and Go (PowerPoint presentation *.ppt with
> optional PLAYLIST.LST or ppview32.exe). This was done by part which
> looks like:
>>>>>> &0 string/c ppt\0 \b, PowerPoint Packed and Go
> !:mime application/vnd.ms-powerpoint
> !:ext ppz
> Unfortunately this also applies to POWERPNT.PP_ found on XP_CD in I38
> 6
> folder. This contains only a single file "powerpnt.ppt" compressed
> with CAB format. So this now becomes like:
>>>>>> &0 string/c ppt\0
>>>>>>> 28 uleshort >1 \b, PowerPoint Packed and Go
> !:mime application/vnd.ms-powerpoint
> !:ext ppz
>>>>>>> 28 uleshort =1 \b, one packed PowerPoint
> !:mime application/vnd.ms-cab-compressed
> !:ext pp_
>
> Before the attribute flags of archive member the date and time in DOS
> format is stored. That was expressed by lines like
> # date stamp for file
> #>10 uleshort x \b, date %#x
> # time stamp for file
> #>12 uleshort x \b, time %#x
> In older version these values could only be displayed as hexadecimal
> values. That is not so interesting for normal users. Luckily in newer
> file command versions there exist now functions to show these values
> in human readable form. So this now becomes like:
>> 10 lemsdosdate x last modified %s
>> 12 lemsdostime x %s
>
> After applying the above mentioned modifications by patch
> file-msdos-cab_point_ppz.diff then i get similar output like before.
> This now looks like:
>
> EN600x64.cab: Microsoft Cabinet archive data,
> many,
> 238518194 bytes, 141 files, at 0x174
> last modified Sun, Nov 06 2021 05:45:08 +A
> "DeviceSetupExe", iFolder 0x1
> last modified Sun, Nov 06 2021 05:11:08 +A
> "DeviceSetupLauncherExe",
> 39 cffolders, flags 0x4,
> ID 12345, number 1, extra bytes 20 in head,
> 838 datablocks, 0 compression
> EN600x86.cab: Microsoft Cabinet archive data,
> many,
> 207048493 bytes, 92 files, at 0x124
> last modified Sun, Nov 06 2021 04:43:10 +A
> "DeviceSetupExe", iFolder 0x1
> last modified Sun, Nov 06 2021 04:17:42 +A
> "DeviceSetupLauncherExe",
> 29 cffolders, flags 0x4,
> ID 12345, number 1, extra bytes 20 in head,
> 744 datablocks, 0 compression
> Full_x64.cab: Microsoft Cabinet archive data,
> many,
> 26505575 bytes, 208 files, at 0x9c
> last modified Sun, Nov 06 2021 05:13:06 +A
> "SureSupply_hpqDTSSEXE", iFolder 0x1
> last modified Sun, Nov 06 2021 05:10:52 +A
> "SureSupply_hpqDTSSUIDLL",
> 12 cffolders, flags 0x4,
> ID 12345, number 1, extra bytes 20 in head,
> 239 datablocks, 0 compression
> POWERPNT.PP_: Microsoft Cabinet archive data,
> one packed PowerPoint,
> 1765 bytes, 1 file, at 0x2c
> last modified Sun, Jul 21 2001 18:42:44 +A
> "powerpnt.ppt", number 1,
> 1 datablock, 0x1503 compression
> PRES0.PPZ: Microsoft Cabinet archive data,
> PowerPoint Packed and Go,
> 2803 bytes, 2 files, at 0x2c
> last modified Sun, Jan 16 2006 18:00:52 +Utf
> "Dummy slide.PPT"
> last modified Sun, Jan 16 2006 18:00:52 +Utf
> "PLAYLIST.LST", number 1,
> 1 datablock, 0x1 compression
> QUOTES._: Microsoft Cabinet archive data,
> single,
> 931 bytes, 1 file, at 0x2c
> last modified Sun, Jul 28 2001 15:08:06 +A
> "quotes", number 1,
> 1 datablock, 0x1503 compression
> hpgid31v4help.cab: Microsoft Cabinet archive data,
> many,
> 1371036 bytes, 35 files, at 0x2c
> last modified Sun, Oct 01 2014 11:47:24 +A
> "arabic.chm"
> last modified Sun, Oct 01 2014 11:47:24 +A
> "bulgrian.chm", ID 37818, number 1,
> 51 datablocks, 0x1 compression
>
> With --extension option for inspected examples the correct file
> name extensions are now shown like:
>
> EN600x64.cab: cab
> EN600x86.cab: cab
> Full_x64.cab: cab
> POWERPNT.PP_: pp_
> PRES0.PPZ: ppz
> QUOTES._: _
> hpgid31v4help.cab: cab
>
> With -i option for inspected examples the correct file mime types are
> now shown like:
>
> EN600x64.cab: application/vnd.ms-cab-compressed; charset=binary
> EN600x86.cab: application/vnd.ms-cab-compressed; charset=binary
> Full_x64.cab: application/vnd.ms-cab-compressed; charset=binary
> POWERPNT.PP_: application/vnd.ms-cab-compressed; charset=binary
> PRES0.PPZ: application/vnd.ms-powerpoint; charset=binary
> QUOTES._: application/vnd.ms-cab-compressed; charset=binary
> hpgid31v4help.cab: application/vnd.ms-cab-compressed; charset=binary
>
> I hope my diff file can be applied in future version of
> file utility.
>
> With best wishes
> Jörg Jenderek
> - --
> Jörg Jenderek
>
>
>
>
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCY6eliAAKCRCv8rHJQhrU
> 1iRcAKCN2fJ58vd/eOPCK57vIzfspNVfyACg3GKW2d1dEpHkD12tTuEJwYoblqc=
> =fsMh
> -----END PGP SIGNATURE-----
> <trid-v-cab.txt.gz><file-msdos-cab_point_ppz_diff.DEFANGED-38><file-msdos-cab_point_ppz_diff_sig.DEFANGED-39>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20221226/5866fca4/attachment.asc>
More information about the File
mailing list