[File] [PATCH] Magdir/archive TSComp archive ; extensions + details

Jörg Jenderek (GMX) joerg.jen.der.ek at gmx.net
Wed Nov 29 18:46:19 UTC 2023


Hello,

Some days ago i must look for some old software samples. Unfortunately
these are packed in some compressed archives. So it took me some hours
to find how to extract such archives and what are the content of my
inspected archives.

When running file command version 5.45 on such archive samples i get an
output like:

CRW3.LIB:     TSComp archive data
Explore.lib:  TSComp archive data
HELP$:        TSComp archive data
INSTALL.EX$:  TSComp archive data
MAKERRES.DL$: TSComp archive data
OTUPDATE.$$$: TSComp archive data
PSP2.CMP:     TSComp archive data
SAMPMIF$:     TSComp archive data
SAMPMML$:     TSComp archive data
TRANTUT$:     TSComp archive data
TWOFILES.TSC: TSComp archive data
WIN.PAK:      TSComp archive data

With option --extension only ??? is displayed and with -i option generic
application/octet-stream is shown.

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). When running TrID
command on such examples these are described as "TSComp compressed data"
by tscomp.trid.xml (See appended output/trid-tscomp-v.txt.gz).

For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/). This
does "recognize" the LIB archives. These are described as "Generic
Library File" by PUID x-fmt/425. This  detection happens based on
unreliable file name suffix LIB.

With the help of these tools i found pages about TSComp on web site file
formats archive team. There also samples to download and unpacking
software like deark are listed. That is now expressed inside
Magdir/archive by comment lines like:
# URL:	http://fileformats.archiveteam.org/wiki/TSComp
# Ref.:	http://mark0.net/download/triddefs_xml.7z
#	defs/t/tscomp.trid.xml
#	https://entropymine.com/deark/releases/deark-1.6.5.tar.gz
#	deark-1.6.5/modules/installshld.c

The detection happens by starting line inside Magdir/archive
which looks like:
0	string	\x65\x5d\x13\x8c\x08\x01\x03\x00 TSComp archive data

All tools use in first step the same recognition method by looking
for byte sequence magic at offset 0.

Instead of generic application/octet-stream mime type i show an user
defined one. The file name suffix depends on sub classification.
For single-file archives, often the last letter of the filename
extension is changed to "$", but i also found samples where exclamation
mark instead of dollar sign is used (like BUILD3.BM!). For multi-file
archives, the most common extensions seem to be '.lib' and '.cmp',
but is also found other names {like SAMPMIF$ (no file name suffix)
OTDATA.$$$ TWOFILES.TSC (obviously abbreviation for tscomp) WIN.PAK
(obviously an abbreviation for packed)}. Luckily the decompressing
software deark can extract archive contents by command like:
	deark -m tscomp -d2 MAKERRES.DL$

I am no c-programmer, but when interpreting source right then in my
"multi-file" samples the filename style value is 2, which means "with
wildcards". For single samples the style is 1, which means no wildcard.
Unfortunately i found no "old" examples with style value 0.

So the start with sub-classification with different suffix now looks like:

  0	string	\x65\x5d\x13\x8c\x08\x01\x03\x00 TSComp archive
  !:mime	application/x-tscomp-compressed
  >0x08	ubyte		0			data, filename style 0
  !:ext	??$
  #>0x08	ubyte		1			data, without wildcard
  >0x08	ubyte		1			data
  !:ext	??$/??!
  >0x08	ubyte		2			data, with wildcard
  !:ext	/lib/cmp/$$$/tsc/pak

When i understand the source right the original file name of first
archive member (pascal string that is DOS 8.3 name), the DOS
modification time stamp and the compressed size can be shown by lines like:
  >0x1c	pstring		x			\b, %s
  >0x16	lemsdosdate	x			\b, modified %s
  >0x18	lemsdostime	x			%s
  >0x0E	ulelong		x			\b, compressed size %u

If an archive contains more than one single file then it is possible to
jump to next, second archive member fragment and show the file name of
second archive member. So this now is done by lines like:
  >0x12	ulelong		>0
  >>(0x12.l+15)	pstring		x		\b, %s ...

This information can also be verified by running command line tool
deark with line like:
	deark -m tscomp -l -d2 SAMPMML$

After applying the above mentioned modifications by patch
file-5.45-archive-tscomp.diff then my samples are in principal
described before, but now some details (like first archive member names
and time stamps) are also shown. So this now looks like:

CRW3.LIB:     TSComp archive data,
	      with wildcard,
	      CRW.HLP, modified Sun, Jul 07 1993 02:00:02,
	      compressed size 642159
Explore.lib:  TSComp archive data,
	      with wildcard,
	      MMATH194.EXE, modified Sun, Jan 24 1995 17:40:50,
	      compressed size 16020
	      , MMATH194.TXT ...
HELP$:        TSComp archive data,
	      with wildcard,
	      BOOK.HLP, modified Sun, Apr 22 1992 17:56:04,
	      compressed size 6937
	      , CHAR.HLP ...
INSTALL.EX$:  TSComp archive data,
	      INSTALL.EXE, modified Sun, Apr 22 1992 17:59:18,
	      compressed size 103271
MAKERRES.DL$: TSComp archive data,
	      MAKERRES.DLL, modified Sun, Nov 17 1992 14:57:18,
	      compressed size 51753
OTUPDATE.$$$: TSComp archive data,
	      with wildcard,
	      WOTRBLD.EXE, modified Sun, Jul 09 1991 11:53:28,
	      compressed size 6591
	      , WUPDLL.DLL ...
PSP2.CMP:     TSComp archive data,
	      with wildcard,
	      PSP.DAT, modified Sun, Aug 14 1993 02:00:00,
	      compressed size 3364
	      , JMCAP.DLL ...
SAMPMIF$:     TSComp archive data,
	      with wildcard,
	      TABLE.MIF, modified Sun, Apr 22 1992 17:55:48,
	      compressed size 856
	      , BARCHART.MIF ...
SAMPMML$:     TSComp archive data,
	      with wildcard,
	      CHFORMAT.MML, modified Sun, Apr 22 1992 17:55:46,
	      compressed size 180
	      , FORMATS.MML ...
TRANTUT$:     TSComp archive data,
	      with wildcard,
	      EARTHTOC.DOC, modified Sun, Oct 19 1992 16:11:16,
	      compressed size 4283
	      , RAINTEXT.DOC ...
TWOFILES.TSC: TSComp archive data,
	      with wildcard,
	      A.TXT, modified Sun, May 05 2020 18:38:00,
	      compressed size 12
	      , B.TXT ...
WIN.PAK:      TSComp archive data,
	      with wildcard,
	      SCIDLL.DLL, modified Sun, Nov 29 1993 08:43:48,
	      compressed size 50960
	      , SIERRAW.ICO ...

I hope my diff file can be applied in future version of
file utility.

I use two test functions lemsdosdate and lemsdostime to interpret 2
byte value as bit encoded date and time in DOS format relative to
year 1980, but these functions are not mentioned in the official
documentation magic.man. So i think these 2 functions should be
mentioned there.

With best wishes,
Jörg Jenderek
--
Jörg Jenderek
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-tscomp-v.txt.gz
Type: application/x-gzip
Size: 473 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20231129/861d7d7a/attachment.bin>
-------------- next part --------------
--- file-5.45/magic/Magdir/archive.old	2023-07-27 20:04:45.000000000 +0200
+++ file-5.45/magic/Magdir/archive	2023-11-29 19:30:57.219049300 +0100
@@ -959,3 +959,41 @@
 # TSComp
-0	string	\x65\x5d\x13\x8c\x08\x01\x03\x00 TSComp archive data
+# Update:	Joerg Jenderek 2023 Nov 
+# URL:		http://fileformats.archiveteam.org/wiki/TSComp
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/t/tscomp.trid.xml
+#		https://entropymine.com/deark/releases/deark-1.6.5.tar.gz
+#		deark-1.6.5/modules/installshld.c 
+# Note:		called "TSComp compressed data" by TrID
+#		verified by command like `deark -m tscomp -l -d2 MAKERRES.DL$`
+#		The "13" might be a version number. The "8c" is a mystery
+0	string	\x65\x5d\x13\x8c\x08\x01\x03\x00 TSComp archive
+#!:mime	application/octet-stream
+!:mime	application/x-tscomp-compressed
+# filename style: 0~old version 1~without wildcard 2~with wildcard
+#>0x08	ubyte		x				\b, filename style %u
+>0x08	ubyte		0				data, filename style 0
+# no example found
+!:ext	??$
+#>0x08	ubyte		1				data, without wildcard
+>0x08	ubyte		1				data
+# for single-file archives, often the last letter of the filename extension is changed to "$"; but also name like: BUILD3.BM!
+!:ext	??$/??!
+>0x08	ubyte		2				data, with wildcard
+# for multi-file archives common extensions seem to be .lib and .cmp, but also names like: SAMPMIF$ OTDATA.$$$ TWOFILES.TSC WIN.PAK 
+!:ext	/lib/cmp/$$$/tsc/pak
+# fnlen; pascal string length; original 1st file name like: CHFORMAT.MML
+>0x1c	pstring		x				\b, %s
+# md->fi->timestamp
+>0x16	lemsdosdate	x				\b, modified %s
+>0x18	lemsdostime	x				%s
+# 1st compressed size: like 180 (SAMPMML$$)
+>0x0E	ulelong		x				\b, compressed size %u
+# de_dbg_indent(c, 1): like: 12h
+#>0x0d	ubyte		x				b, at 0xD %#x
+# like: 0
+#>0x1A	ubeshort	x				\b, at 0x1A %#x
+# 2nd member offset
+#>0x12	ulelong		x				\b, next offset %#x
+>0x12	ulelong		>0
+# original 2nd archive member name like: FORMATS.MML
+>>(0x12.l+15)	pstring	x				\b, %s ...
 # ARQ
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.45-archive-tscomp.diff.sig
Type: application/octet-stream
Size: 1213 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20231129/861d7d7a/attachment.obj>


More information about the File mailing list