[File] [PATCH] Magdir/ole2compounddocs *.CUB described as Microsoft Windows Installer Package *.MSI

Jörg Jenderek joerg.jen.der.ek at gmx.net
Sat Dec 3 21:10:47 UTC 2022


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

Some weeks ago i run the cleaning tool czkawka found on
https://qarmin.github.io/czkawka/. One menu item concerns bad
extensions. After running tool i looked in saved file list
results_bad_extensions.txt for bad extension examples.

One listed extension is CUB. I found such examples as part of WiX
tool set and Orca software on Windows 8 and 10 systems.

When running file command version 5.43 with -e cdf option on such
samples i get an output like:

Vstalogo.cub: OLE 2 Compound Document, v4.62, SecID 0x1,
	      Mini FAT start sector 0x5,
	      blocksize 4096
	      : Microsoft Windows Installer Package
XPlogo.cub:   OLE 2 Compound Document, v4.62, SecID 0x1,
	      Mini FAT start sector 0x5,
	      blocksize 4096
	      : Microsoft Windows Installer Package
darice.cub:   OLE 2 Compound Document, v3.62, SecID 0x1,
	      11 FAT sectors,
	      Mini FAT start sector 0x3a, 2 Mini FAT sectors
	      : Microsoft Windows Installer Package
logo.cub:     OLE 2 Compound Document, v4.62, SecID 0x1,
	      Mini FAT start sector 0x5,
	      blocksize 4096
	      : Microsoft Windows Installer Package
mergemod.cub: OLE 2 Compound Document, v3.62, SecID 0x1,
	      8 FAT sectors,
	      Mini FAT start sector 0x4, 2 Mini FAT sectors
	      : Microsoft Windows Installer Package

With option --extension only 3 byte sequence msi is shown and with
option -i application/x-msi is shown.

When running file command with -e soft or no extra option for the
examples i get lines like:

Vstalogo.cub: Composite Document File V2 Document, Little Endian,
	      Os: Windows, Version 10.0, MSI Installer,
	      Code page: 1252, Title: Installation Database, Subject:
	      Internal Consistency Evaluators For Windows Vista Logo,
	      Author: Microsoft Corporation, Keywords:
	      Installer,MSI,Database,
	      Comments: Validates MSI Databases,
	      Template: Intel;1033, Revision Number:
	      {154AA518-3785-4E1C-BF42-D7D203F50B92},
	      Number of Pages: 100, Number of Words: 1,
	      Name of Creating Application:
	      Microsoft Installer, Security: 1, Last Printed:
	      Mon Mar 18 07:00:00 2019, Create Time/Date:
	      Mon Mar 18 07:00:00 2019
XPlogo.cub:   Composite Document File V2 Document, Little Endian,
	      Os: Windows, Version 10.0, MSI Installer,
	      Code page: 1252, Title: Installation Database, Subject:
	      Internal Consistency Evaluators For XP Logo,
	      Author: Microsoft Corporation, Keywords:
	      Installer,MSI,Database,
	      Comments: Validates MSI Databases,
	      Template: Intel;1033, Revision Number:
	      {38C5F470-D3C4-11D1-A84F-006097ABDE17},
	      Number of Pages: 100, Number of Words: 1,
	      Name of Creating Application:
	      Microsoft Installer, Security: 1, Last Printed:
	      Mon Mar 18 07:00:00 2019, Create Time/Date:
	      Mon Mar 18 07:00:00 2019
darice.cub:   Composite Document File V2 Document, Little Endian,
	      Os: Windows, Version 6.1, MSI Installer,
	      Code page: 1252, Title: Installation Database, Subject:
	      Internal Consistency Evaluators - Full Set,
	      Author: Microsoft Corporation, Keywords:
	      Installer,MSI,Database,
	      Comments: Validates MSI Databases,
	      Template: Intel;1033, Revision Number:
	      {314D57F5-9F5E-4B0B-81EC-F821BD9B05E2},
	      Number of Pages: 100, Number of Words: 1,
	      Name of Creating Application:
	      Microsoft Installer, Security: 1, Last Printed:
	      Tue Jan 12 08:00:00 2010, Create Time/Date:
	      Tue Jan 12 08:00:00 2010
logo.cub:     Composite Document File V2 Document, Little Endian,
	      Os: Windows, Version 10.0, MSI Installer,
	      Code page: 1252, Title: Installation Database, Subject:
	      Internal Consistency Evaluators For NT5 Logo,
	      Author: Microsoft Corporation, Keywords:
	      Installer,MSI,Database,
	      Comments: Validates MSI Databases,
	      Template: Intel;1033, Revision Number:
	      {38C5F470-D3C4-11D1-A84F-006097ABDE17},
	      Number of Pages: 100, Number of Words: 1,
	      Name of Creating Application:
	      Microsoft Installer, Security: 1, Last Printed:
	      Mon Mar 18 07:00:00 2019, Create Time/Date:
	      Mon Mar 18 07:00:00 2019
mergemod.cub: Composite Document File V2 Document, Little Endian,
	      Os: Windows, Version 5.2, MSI Installer,
	      Code page: 1252, Title: Installation Database, Subject:
	      Internal Consistency Evaluators - Merge Modules,
	      Author: Microsoft Corporation, Keywords:
	      Installer,MSI,Database,
	      Comments: Validates MSI Databases,
	      Template: Intel;1033, Revision Number:
	      {1A337547-9290-43CB-97A8-EE88C2AA3CB0},
	      Create Time/Date:
	      Mon Aug  2 21:37:00 1999,
	      Number of Pages: 100, Number of Words: 1,
	      Name of Creating Application:
	      Microsoft Installer, Security: 1

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). This identifies also
all examples with low priority as "Generic OLE2 / Multistream
Compound" by docfile.trid.xml. The examples are described with
highest priority as "Windows SDK Setup Transform script" with wrong
suffix MST by mst.trid.xml and mid range rate as "Windows Installer
Patch" with wrong suffix MSP by msp.trid.xml (See appended
trid-v-cub.txt.gz).

For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/). This
identifies all layouts only generic as "OLE2 Compound Document"
by PUID fmt/111.

Luckily i also found a section "ICE validation" on Windows Installer
page on Wikipedia web site. That informations are now expressed by
comment lines inside Magdir/ole2compounddocs like:
# URL:	https://en.wikipedia.or/wiki/Windows_Installer#ICE_validation
#	https://learn.microsoft.com/en-us/windows/win32/msi/
#	internal-consistency-evaluators-ices

The CUB samples are recognized as "OLE 2 Compound Document"
by starting bytes (\320\317\021\340\241\261\032\341) at the beginning
inside Magdir/ole2compounddocs. Obviously there exist a code fragment
to do sub class identification. So the examples are described as
"Microsoft Windows Installer Package" by a specific GUID, because
CUB files are stripped-down MSI files. So the concerning lines look
like:
 >>88 	ubequad		0xc000000000000046
 >>>80 	ubequad		0x84100c0000000000	: \
	Microsoft Windows Installer Package
 !:mime	application/x-msi
 !:ext	msi

A similar problem exist for other Microsoft formats. For Microsoft
Word 6-95 the same GUID is used for document (*.DOC) and document
templates (*.DOT). So both are described by one magic with phrase
"document or template" and shown extension string "doc/dot". So i
handle CUB examples in same way. So this code fragment now becomes:
 >>88 	ubequad		0xc000000000000046
 >>>80 	ubequad		0x84100c0000000000	: \
	Microsoft Windows Installer Package or validation module
 !:mime	application/x-msi
 !:ext	msi/cub

After applying the above mentioned modifications by patch
file-ole2compounddocs-cub.diff then all my inspected Microsoft
Windows Installer validation modules (*.CUB) are now also
recognized together with MSI samples. This now looks with -e cdf
option like:

Vstalogo.cub: OLE 2 Compound Document, v4.62, SecID 0x1,
	      Mini FAT start sector 0x5,
	      blocksize 4096
	      : Microsoft Windows Installer Package
	      or validation module
XPlogo.cub:   OLE 2 Compound Document, v4.62, SecID 0x1,
	      Mini FAT start sector 0x5,
	      blocksize 4096
	      : Microsoft Windows Installer Package
	      or validation module
darice.cub:   OLE 2 Compound Document, v3.62, SecID 0x1,
	      11 FAT sectors,
	      Mini FAT start sector 0x3a, 2 Mini FAT sectors
	      : Microsoft Windows Installer Package
	      or validation module
logo.cub:     OLE 2 Compound Document, v4.62, SecID 0x1,
	      Mini FAT start sector 0x5,
	      blocksize 4096
	      : Microsoft Windows Installer Package
	      or validation module
mergemod.cub: OLE 2 Compound Document, v3.62, SecID 0x1,
	      8 FAT sectors,
	      Mini FAT start sector 0x4, 2 Mini FAT sectors
	      : Microsoft Windows Installer Package
	      or validation module

With -e cdf and --extension option this now looks like:
Vstalogo.cub: msi/cub
XPlogo.cub:   msi/cub
darice.cub:   msi/cub
logo.cub:     msi/cub
mergemod.cub: msi/cub

I hope my diff file can be applied in future version of file
utility. Maybe that there exist the possibility to do further sub
classification between MSI and CUB. But for that purpose you must
know what is specific for CUB samples and does not occur in "normal"
MSI samples. I do not know.

With best wishes,
Jörg Jenderek
- --
Jörg Jenderek
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCY4u7VgAKCRCv8rHJQhrU
1im8AKCQ8fwFZiEZddxWsraOjXIW9A32MQCcDc3e++xhg8WpChGPhavk9UBoeIM=
=BgON
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-cub.txt.gz
Type: application/x-gzip
Size: 614 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20221203/317ac019/attachment.bin>
-------------- next part --------------
--- file-master/magic/Magdir/ole2compounddocs.old	2022-12-03 18:25:55.362517900 +0100
+++ file-master/magic/Magdir/ole2compounddocs	2022-12-03 21:55:43.148281100 +0100
@@ -336,6 +336,12 @@
 # URL:	http://fileformats.archiveteam.org/wiki/Windows_Installer
->>>80 	ubequad		0x84100c0000000000	: Microsoft Windows Installer Package
+#	https://en.wikipedia.org/wiki/Windows_Installer#ICE_validation
+# Update: Joerg Jenderek
+# Windows Installer Package *.MSI or validation module *.CUB
+>>>80 	ubequad		0x84100c0000000000	: Microsoft Windows Installer Package or validation module
 !:mime	application/x-msi
 #!:mime	application/x-ms-win-installer
-!:ext	msi
+#	https://learn.microsoft.com/en-us/windows/win32/msi/internal-consistency-evaluators-ices
+# cub is used for validation module like: Vstalogo.cub XPlogo.cub darice.cub logo.cub mergemod.cub
+#!:mime	application/x-ms-cub
+!:ext	msi/cub
 >>>80 	ubequad		0x86100c0000000000	: Microsoft Windows Installer Patch
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-ole2compounddocs-cub.diff.sig
Type: application/octet-stream
Size: 624 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20221203/317ac019/attachment.obj>


More information about the File mailing list