[File] [PATCH] Magdir/archive+windows; InstallShield setup header *.HDR+Language Identifier *.LID ; *.INS; *.TAG

Christos Zoulas christos at zoulas.com
Sun Nov 7 16:26:48 UTC 2021


Applied, thanks!

christos

> On Nov 4, 2021, at 11:30 AM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
> 
> Hello,
> 
> some days ago i installed an old Windows software. In installation
> directory are files which are not recognised or described only partly
> or generic by file command.
> 
> When running running file command version 5.41 on such examples i
> get an output like:
> 
> DATA.TAG:   ASCII text, with CRLF line terminators
> Setup.exe:  PE32 executable (GUI) Intel 80386, for MS Windows
> _sys1.cab:  InstallShield CAB
> _sys1.hdr:  InstallShield CAB
> _user1.cab: InstallShield CAB
> _user1.hdr: InstallShield CAB
> data1.cab:  InstallShield CAB
> data1.hdr:  InstallShield CAB
> data2.cab:  InstallShield CAB
> setup.ins:  COM executable for DOS
> setup.lid:  ASCII text, with CRLF line terminators
> 
> Furthermore with --extension option only 3 character sequence ??? is
> shown. With -i option only generic mime types like text/plain or
> application/octet-stream are shown.
> 
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html).
> Some HDR examples are described correctly by TrID first as
> "InstallShield setup header" by ark-cab-ishield-hdr.trid.xml and
> second generic as "InstallShield compressed Archive" by
> ark-cab-ishield.trid.xml.
> Most INS examples are described correctly as "InstallShield Script"
> by ins.trid.xml.
> The examples described by file command as "ASCII text" are described
> as "Generic INI configuration" by ini.trid.xml. All LID examples are
> described more specific as "InstallShield Language Identifier" by
> lid-is.trid.xml and all DATA.TAG examples are described as "TagInfo
> data" by taginfo.trid.xml (See appended installshield-trid-v.txt.gz).
> 
> This list the correct file name extensions and often with -v option
> the related URL pointing to used file format information. Luckily
> there exist a free software unshield, that can handle such
> InstallShield Cabinet archives. The relevant information is found an
> header file cabfile.h and c-source file helper.c. So these
> informations are now expressed by comment lines inside
> Magdir//archive like:
> # URL:		https://en.wikipedia.org/wiki/InstallShield
> # Reference:	https://github.com/twogood/unshield
> #		/blob/master/lib/cabfile.h
> # https://github.com/twogood/unshield/blob/master/lib/helper.c
> 
> In current version the only magic line looks like:
> 0	string	ISc( InstallShield CAB
> 
> Now after test for this CAB_SIGNATURE (0x28635349) now according to
> c-source print version information by line like:
>> 4	ulelong	x	\b, version %#x
> 
> Afterwards print volume_info and cab_descriptor_offset with unusual
> values like:
>> 8	ulelong	!0	\b, volume_info %#x
>> 12	ulelong	!0x200	\b, offset %#x
> Afterward the cab_descriptor_size is shown if non zero by line like:
>> 16	ulelong	!0	\b, descriptor size %#x
> 
> After inspecting hundreds of InstallShield this value was zero in all
> my CAB examples and non zero in my HDR examples. Hoping that this is
> always true, i use this observation to distinguish HDR from CAB
> InstallShield with correct file name extensions by lines like:
> 0	string	ISc( InstallShield
> !:mime		application/x-installshield
>> 16	ulelong	!0	setup header
> !:ext	hdr
>> 16	ulelong	=0	CAB
> !:ext	cab
> Instead generic mime type application/octet-stream i display a user
> defined one.
> 
> Unfortunately no official or complete documentation exist for LID
> file format. So i use information provided by TrID. So this
> information is manifested inside Magdir/windows by comment lines like:
> # URL:		https://en.wikipedia.org/wiki/InstallShield
> # Reference:	http://mark0.net/download/triddefs_xml.7z
> #		defs/l/lid-is.trid.xml
> According to TrID i create equivalent magic lines inside ini-file
> sub routine of Magdir/windows after Windows code page translator
> section. This now looks like:
>>> &0	regex/c	\^(Languages)]	InstallShield Language Identifier
> !:mime	text/x-installshield-lid
> !:ext	lid
> 
> Instead of generic mime type text/plain i display a user defined one.
> The test for keyword Languages in bracket section was sufficient to
> recognize my LID examples. IF this is not sufficient then additional
> test for keywords (three like: count Default key0 mentioned in global
> strings section of TrID definition) must be done.
> 
> Unfortunately no official or complete documentation exist for TagInfo
> file format. So i use information provided by TrID. So this
> information is manifested inside Magdir/windows after LID section by
> comment lines like:
> # URL:		https://www.file-extensions.org/tag-file-extension
> # Reference:	http://mark0.net/download/triddefs_xml.7z
> #		defs/t/taginfo.trid.xml
> According to TrID i create equivalent magic lines inside ini-file
> sub routine of Magdir/windows after Windows codepage translator
> section. This now looks like:
>>> &0	regex/c	\^(TagInfo)]	TagInfo
> !:mime	text/x-ms-tag
> !:ext	tag
> Instead of generic mime type text/plain i display a user defined one.
> The test for keyword TagInfo in bracket section was sufficient to
> recognize my DATA.TAG examples. IF this is not sufficient then
> additional test for keywords (like: Application Category Company Misc
> Version mentioned in global strings section of TrID definition) must
> be done.
> 
> Unfortunately no official or complete documentation exist for
> InstallShield INS file format. So i use information provided by TrID.
> So this information is manifested at the end inside Magdir/windows by
> comment lines like:
> # URL:		https://en.wikipedia.org/wiki/InstallShield
> # Reference:	http://mark0.net/download/triddefs_xml.7z
> #		defs/i/ins.trid.xml
> According to TrID i create equivalent magic lines in Magdir/windows
> This start with lines like:
> 0	ubelong	0xB8C90C00	InstallShield Script
> !:mime	application/x-installshield-ins
> !:ext	ins
> Instead of generic mime type application/octet-stream i display a
> user defined one.
> Because i am unsure if this starting 4-byte value is always true i
> look for additional information inside INS examples. Apparently
> before strings the length of string is stored as 2 byte little endian
> integer. So the first string seem to be a copyright message at fixed
> offset like "Stirling Technologies, Inc.  (c) 1990-1994" or
> "InstallSHIELD Software Coporation  (c) 1990-1997". This is displayed
> by line like:
>> 13	pstring/h	x		"%s"
> 
> In global strings section of TrID definition are mentioned some
> keywords like: SRCDIR, SRCDISK, TARGETDISK, TARGETDIR, WINDIR,
> WINDISK, WINSYSDIR, LOGHANDLE. Apparently this seem to be variable
> names, which maybe can be used to configure some install options.
> 
> Some few dozen bytes later inside INS examples at different offsets
> these names appear in same order. And before a kind of sequence
> number seems to be stored as 2 byte integer. So show this variable
> name information by additional lines like:
>> 1	search/0x121/s	SRCDIR	\b, variable names:
>>> &-4		leshort		x	#%u
>>> &-2		pstring/h	x	%s
>>>> &0		leshort		x	#%u
>>>> &2		pstring/h	x	%s
>>>>> &0		leshort		x	#%u
>>>>> &2		pstring/h	x	%s
>>>>>> &0	leshort		x	#%u
>>>>>> &2	pstring/h	x	%s
>>>>>>> &0	leshort		x	#%u
>>>>>>> &2	pstring/h	x	%s
>>>>>>>> &0	leshort		x	#%u
>>>>>>>> &2	pstring/h	x	%s
>>>>>>>>> &0	leshort		x	#%u
>>>>>>>>> &2	pstring/h	x	%s
>> 0		ubelong		x	...
> 
> After applying the above mentioned modifications by patches
> file-5.41-archive-installshield.diff and file-5.41-windows-lid.diff
> then all my InstallShield examples are now recognised or are
> described with more details like:
> DATA.TAG:   TagInfo
> Setup.exe:  PE32 executable (GUI) Intel 80386, for MS Windows
> _sys1.cab:  InstallShield CAB, version 0x1005201
> _sys1.hdr:  InstallShield setup header, version 0x1005201,
> 	    descriptor size 0x116c
> _user1.cab: InstallShield CAB, version 0x1005201
> _user1.hdr: InstallShield setup header, version 0x1005201,
> 	    descriptor size 0x130b
> data1.cab:  InstallShield CAB, version 0x4000834
> data1.hdr:  InstallShield setup header, version 0x1007000,
> 	    descriptor size 0x1e9ee
> data2.cab:  InstallShield CAB, version 0x20005dc
> setup.ins:  InstallShield Script
> 	    "InstallSHIELD Software Coporation  (c) 1990-1997",
> 	    variable names: #0 SRCDIR #1 SRCDISK #2 TARGETDISK ...
> setup.lid:  InstallShield Language Identifier
> 
> I hope my 2 diff files can be applied in future version of file utility.
> 
> With best wishes
> Jörg Jenderek
> --
> Jörg Jenderek
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> <Nachrichtenteil als Anhang.DEFANGED-14><file-5_41-windows-lid_diff.DEFANGED-15><file-5_41-windows-lid_diff_sig.DEFANGED-16><file-5_41-archive-installshield_diff_sig.DEFANGED-17><file-5_41-archive-installshield_diff.DEFANGED-18><installshield-trid-v.txt.gz>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20211107/e739ac7a/attachment.asc>


More information about the File mailing list