[File] [PATCH] Magdir/archive+windows; InstallShield setup header *.HDR+Language Identifier *.LID ; *.INS; *.TAG

Jörg Jenderek joerg.jen.der.ek at gmx.net
Thu Nov 4 15:30:01 UTC 2021


Hello,

some days ago i installed an old Windows software. In installation
directory are files which are not recognised or described only partly
or generic by file command.

When running running file command version 5.41 on such examples i
get an output like:

DATA.TAG:   ASCII text, with CRLF line terminators
Setup.exe:  PE32 executable (GUI) Intel 80386, for MS Windows
_sys1.cab:  InstallShield CAB
_sys1.hdr:  InstallShield CAB
_user1.cab: InstallShield CAB
_user1.hdr: InstallShield CAB
data1.cab:  InstallShield CAB
data1.hdr:  InstallShield CAB
data2.cab:  InstallShield CAB
setup.ins:  COM executable for DOS
setup.lid:  ASCII text, with CRLF line terminators

Furthermore with --extension option only 3 character sequence ??? is
shown. With -i option only generic mime types like text/plain or
application/octet-stream are shown.

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html).
Some HDR examples are described correctly by TrID first as
"InstallShield setup header" by ark-cab-ishield-hdr.trid.xml and
second generic as "InstallShield compressed Archive" by
ark-cab-ishield.trid.xml.
Most INS examples are described correctly as "InstallShield Script"
by ins.trid.xml.
The examples described by file command as "ASCII text" are described
as "Generic INI configuration" by ini.trid.xml. All LID examples are
described more specific as "InstallShield Language Identifier" by
lid-is.trid.xml and all DATA.TAG examples are described as "TagInfo
data" by taginfo.trid.xml (See appended installshield-trid-v.txt.gz).

This list the correct file name extensions and often with -v option
the related URL pointing to used file format information. Luckily
there exist a free software unshield, that can handle such
InstallShield Cabinet archives. The relevant information is found an
header file cabfile.h and c-source file helper.c. So these
informations are now expressed by comment lines inside
Magdir//archive like:
# URL:		https://en.wikipedia.org/wiki/InstallShield
# Reference:	https://github.com/twogood/unshield
#		/blob/master/lib/cabfile.h
# https://github.com/twogood/unshield/blob/master/lib/helper.c

In current version the only magic line looks like:
0	string	ISc( InstallShield CAB

Now after test for this CAB_SIGNATURE (0x28635349) now according to
c-source print version information by line like:
 >4	ulelong	x	\b, version %#x

Afterwards print volume_info and cab_descriptor_offset with unusual
values like:
 >8	ulelong	!0	\b, volume_info %#x
 >12	ulelong	!0x200	\b, offset %#x
Afterward the cab_descriptor_size is shown if non zero by line like:
 >16	ulelong	!0	\b, descriptor size %#x

After inspecting hundreds of InstallShield this value was zero in all
my CAB examples and non zero in my HDR examples. Hoping that this is
always true, i use this observation to distinguish HDR from CAB
InstallShield with correct file name extensions by lines like:
 0	string	ISc( InstallShield
 !:mime		application/x-installshield
 >16	ulelong	!0	setup header
 !:ext	hdr
 >16	ulelong	=0	CAB
 !:ext	cab
Instead generic mime type application/octet-stream i display a user
defined one.

Unfortunately no official or complete documentation exist for LID
file format. So i use information provided by TrID. So this
information is manifested inside Magdir/windows by comment lines like:
 # URL:		https://en.wikipedia.org/wiki/InstallShield
 # Reference:	http://mark0.net/download/triddefs_xml.7z
 #		defs/l/lid-is.trid.xml
According to TrID i create equivalent magic lines inside ini-file
sub routine of Magdir/windows after Windows code page translator
section. This now looks like:
 >>&0	regex/c	\^(Languages)]	InstallShield Language Identifier
 !:mime	text/x-installshield-lid
 !:ext	lid

Instead of generic mime type text/plain i display a user defined one.
The test for keyword Languages in bracket section was sufficient to
recognize my LID examples. IF this is not sufficient then additional
test for keywords (three like: count Default key0 mentioned in global
 strings section of TrID definition) must be done.

Unfortunately no official or complete documentation exist for TagInfo
file format. So i use information provided by TrID. So this
information is manifested inside Magdir/windows after LID section by
comment lines like:
 # URL:		https://www.file-extensions.org/tag-file-extension
 # Reference:	http://mark0.net/download/triddefs_xml.7z
 #		defs/t/taginfo.trid.xml
According to TrID i create equivalent magic lines inside ini-file
sub routine of Magdir/windows after Windows codepage translator
section. This now looks like:
 >>&0	regex/c	\^(TagInfo)]	TagInfo
 !:mime	text/x-ms-tag
 !:ext	tag
Instead of generic mime type text/plain i display a user defined one.
The test for keyword TagInfo in bracket section was sufficient to
recognize my DATA.TAG examples. IF this is not sufficient then
additional test for keywords (like: Application Category Company Misc
Version mentioned in global strings section of TrID definition) must
be done.

Unfortunately no official or complete documentation exist for
InstallShield INS file format. So i use information provided by TrID.
So this information is manifested at the end inside Magdir/windows by
comment lines like:
 # URL:		https://en.wikipedia.org/wiki/InstallShield
 # Reference:	http://mark0.net/download/triddefs_xml.7z
 #		defs/i/ins.trid.xml
According to TrID i create equivalent magic lines in Magdir/windows
This start with lines like:
 0	ubelong	0xB8C90C00	InstallShield Script
 !:mime	application/x-installshield-ins
 !:ext	ins
Instead of generic mime type application/octet-stream i display a
user defined one.
Because i am unsure if this starting 4-byte value is always true i
look for additional information inside INS examples. Apparently
before strings the length of string is stored as 2 byte little endian
integer. So the first string seem to be a copyright message at fixed
offset like "Stirling Technologies, Inc.  (c) 1990-1994" or
"InstallSHIELD Software Coporation  (c) 1990-1997". This is displayed
by line like:
 >13	pstring/h	x		"%s"

In global strings section of TrID definition are mentioned some
keywords like: SRCDIR, SRCDISK, TARGETDISK, TARGETDIR, WINDIR,
WINDISK, WINSYSDIR, LOGHANDLE. Apparently this seem to be variable
names, which maybe can be used to configure some install options.

Some few dozen bytes later inside INS examples at different offsets
these names appear in same order. And before a kind of sequence
number seems to be stored as 2 byte integer. So show this variable
name information by additional lines like:
 >1	search/0x121/s	SRCDIR	\b, variable names:
 >>&-4		leshort		x	#%u
 >>&-2		pstring/h	x	%s
 >>>&0		leshort		x	#%u
 >>>&2		pstring/h	x	%s
 >>>>&0		leshort		x	#%u
 >>>>&2		pstring/h	x	%s
 >>>>>&0	leshort		x	#%u
 >>>>>&2	pstring/h	x	%s
 >>>>>>&0	leshort		x	#%u
 >>>>>>&2	pstring/h	x	%s
 >>>>>>>&0	leshort		x	#%u
 >>>>>>>&2	pstring/h	x	%s
 >>>>>>>>&0	leshort		x	#%u
 >>>>>>>>&2	pstring/h	x	%s
 >0		ubelong		x	...

After applying the above mentioned modifications by patches
file-5.41-archive-installshield.diff and file-5.41-windows-lid.diff
then all my InstallShield examples are now recognised or are
described with more details like:
DATA.TAG:   TagInfo
Setup.exe:  PE32 executable (GUI) Intel 80386, for MS Windows
_sys1.cab:  InstallShield CAB, version 0x1005201
_sys1.hdr:  InstallShield setup header, version 0x1005201,
	    descriptor size 0x116c
_user1.cab: InstallShield CAB, version 0x1005201
_user1.hdr: InstallShield setup header, version 0x1005201,
	    descriptor size 0x130b
data1.cab:  InstallShield CAB, version 0x4000834
data1.hdr:  InstallShield setup header, version 0x1007000,
	    descriptor size 0x1e9ee
data2.cab:  InstallShield CAB, version 0x20005dc
setup.ins:  InstallShield Script
	    "InstallSHIELD Software Coporation  (c) 1990-1997",
	    variable names: #0 SRCDIR #1 SRCDISK #2 TARGETDISK ...
setup.lid:  InstallShield Language Identifier

I hope my 2 diff files can be applied in future version of file utility.

With best wishes
Jörg Jenderek
--
Jörg Jenderek













































-------------- next part --------------
-- 
File mailing list
File at astron.com
https://mailman.astron.com/mailman/listinfo/file

-------------- next part --------------
--- file-5.41/magic/Magdir/windows.old	2021-05-12 16:30:24 +0000
+++ file-5.41/magic/Magdir/windows	2021-11-04 14:40:45 +0000
@@ -585,2 +585,21 @@
 !:ext	cpx
+# From:		Joerg Jenderek
+# URL:		https://en.wikipedia.org/wiki/InstallShield
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/l/lid-is.trid.xml
+# Note:		contain also 3 keywords like: count Default key0
+>>&0	regex/c		\^(Languages)]					InstallShield Language Identifier
+#!:mime	text/plain
+!:mime	text/x-installshield-lid
+# like: SETUP.LID
+!:ext	lid
+# From:		Joerg Jenderek
+# URL:		https://www.file-extensions.org/tag-file-extension
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/t/taginfo.trid.xml
+# Note:		contain also keywords like: Application Category Company Misc Version
+>>&0	regex/c		\^(TagInfo)]					TagInfo
+#!:mime	text/plain
+#!:mime	text/prs.lines.tag
+!:mime	text/x-ms-tag
+# like: DATA.TAG
+!:ext	tag
 # unknown keyword after opening bracket
@@ -1082 +1101,43 @@
 !:ext	slk/sylk
+
+# From:		Joerg Jenderek
+# URL:		https://en.wikipedia.org/wiki/InstallShield
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/i/ins.trid.xml
+# Note:		contain also keywords like: BATCH_INSTALL ISVERSION LOGHANDLE SRCDIR SRCDISK WINDIR WINSYSDISK 
+0	ubelong	0xB8C90C00	InstallShield Script
+#!:mime	application/octet-stream
+!:mime	application/x-installshield-ins
+# like test.ins Setup.ins
+!:ext	ins
+# UNKNOWN like: 160034121de07e00 1600341260befe00 16003412e0783700
+# 5000010021083f00 50000100b0335600 50000100cbfdf800 50000100dfbc4700
+#>4	ubequad		x		\b, at 4 %#16.16llx
+# copyright text like:	"Stirling Technologies, Inc.  (c) 1990-1994"
+#			"InstallSHIELD Software Coporation  (c) 1990-1997"
+>13	pstring/h	x		"%s"
+# look for specific ASCII variable names
+>1	search/0x121/s	SRCDIR	\b, variable names:
+# 1st like: SRCDIR
+>>&-4		leshort		x	#%u
+>>&-2		pstring/h	x	%s
+# 2nd like: SRCDISK
+>>>&0		leshort		x	#%u
+>>>&2		pstring/h	x	%s
+# 3rd like: TARGETDISK
+>>>>&0		leshort		x	#%u
+>>>>&2		pstring/h	x	%s
+# 4th like: TARGETDIR
+#>>>>>&0		leshort		x	#%u
+#>>>>>&2		pstring/h	x	%s
+# 5th like: WINDIR
+#>>>>>>&0	leshort		x	#%u
+#>>>>>>&2	pstring/h	x	%s
+# 6th like: WINDISK
+#>>>>>>>&0	leshort		x	#%u
+#>>>>>>>&2	pstring/h	x	%s
+# 7th like: WINSYSDIR
+#>>>>>>>>&0	leshort		x	#%u
+#>>>>>>>>&2	pstring/h	x	%s
+# ... LOGHANDLE
+>0		ubelong		x	...
+#
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.41-windows-lid.diff.sig
Type: application/octet-stream
Size: 1191 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20211104/7a263d90/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.41-archive-installshield.diff.sig
Type: application/octet-stream
Size: 894 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20211104/7a263d90/attachment-0001.obj>
-------------- next part --------------
--- file-5.41/magic/Magdir/archive.old	2021-08-30 09:10:26 +0000
+++ file-5.41/magic/Magdir/archive	2021-11-04 14:58:47 +0000
@@ -780,3 +780,29 @@
 # InstallShield CAB
-0	string	ISc( InstallShield CAB
+# Update:	Joerg Jenderek at Nov 2021
+# URL:		https://en.wikipedia.org/wiki/InstallShield
+# Reference:	https://github.com/twogood/unshield/blob/master/lib/cabfile.h
+# Note:		Not compatible with Microsoft CAB files
+# http://mark0.net/download/triddefs_xml.7z/defs/a/ark-cab-ishield.trid.xml
+# CAB_SIGNATURE 0x28635349
+0	string	ISc( InstallShield
+#!:mime		application/octet-stream
+!:mime		application/x-installshield
+# http://mark0.net/download/triddefs_xml.7z/defs/a/ark-cab-ishield-hdr.trid.xml
+>16	ulelong	!0	setup header
+# like: _SYS1.HDR _USER1.HDR data1.hdr
+!:ext	hdr
+>16	ulelong	=0	CAB
+# like: _SYS1.CAB _USER1.CAB DATA1.CAB  data2.cab
+!:ext	cab
+# https://github.com/twogood/unshield/blob/master/lib/helper.c
+# version like:	0x1005201 0x100600c 0x1007000 0x1009500
+#		0x2000578 0x20005dc 0x2000640 0x40007d0 0x4000834
+>4	ulelong	x	\b, version %#x
+# volume_info like: 0
+>8	ulelong	!0	\b, volume_info %#x
+# cab_descriptor_offset like: 0x200
+>12	ulelong	!0x200	\b, offset %#x
+#>0x200	ubequad	x	\b, at 0x200 %#16.16llx
+# cab_descriptor_size like: 0 (*.cab) BD5 C8B DA5 E2A E36 116C 251D 4DA9 56F0 5CC2 6E4B 777D 779E 1F7C2
+>16	ulelong	!0	\b, descriptor size %#x
 # TOP4
-------------- next part --------------
A non-text attachment was scrubbed...
Name: installshield-trid-v.txt.gz
Type: application/x-gzip
Size: 1008 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20211104/7a263d90/attachment.bin>


More information about the File mailing list