[File] [PATCH] Magdir/archive TSComp archive ; extensions + details

Christos Zoulas christos at zoulas.com
Sat Dec 2 13:50:42 UTC 2023


Committed, thanks!

christos

> On Nov 29, 2023, at 1:46 PM, Jörg Jenderek (GMX) <joerg.jen.der.ek at gmx.net> wrote:
> 
> Hello,
> 
> Some days ago i must look for some old software samples. Unfortunately
> these are packed in some compressed archives. So it took me some hours
> to find how to extract such archives and what are the content of my
> inspected archives.
> 
> When running file command version 5.45 on such archive samples i get an
> output like:
> 
> CRW3.LIB:     TSComp archive data
> Explore.lib:  TSComp archive data
> HELP$:        TSComp archive data
> INSTALL.EX$:  TSComp archive data
> MAKERRES.DL$: TSComp archive data
> OTUPDATE.$$$: TSComp archive data
> PSP2.CMP:     TSComp archive data
> SAMPMIF$:     TSComp archive data
> SAMPMML$:     TSComp archive data
> TRANTUT$:     TSComp archive data
> TWOFILES.TSC: TSComp archive data
> WIN.PAK:      TSComp archive data
> 
> With option --extension only ??? is displayed and with -i option generic
> application/octet-stream is shown.
> 
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). When running TrID
> command on such examples these are described as "TSComp compressed data"
> by tscomp.trid.xml (See appended output/trid-tscomp-v.txt.gz).
> 
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/). This
> does "recognize" the LIB archives. These are described as "Generic
> Library File" by PUID x-fmt/425. This  detection happens based on
> unreliable file name suffix LIB.
> 
> With the help of these tools i found pages about TSComp on web site file
> formats archive team. There also samples to download and unpacking
> software like deark are listed. That is now expressed inside
> Magdir/archive by comment lines like:
> # URL:	http://fileformats.archiveteam.org/wiki/TSComp
> # Ref.:	http://mark0.net/download/triddefs_xml.7z
> #	defs/t/tscomp.trid.xml
> #	https://entropymine.com/deark/releases/deark-1.6.5.tar.gz
> #	deark-1.6.5/modules/installshld.c
> 
> The detection happens by starting line inside Magdir/archive
> which looks like:
> 0	string	\x65\x5d\x13\x8c\x08\x01\x03\x00 TSComp archive data
> 
> All tools use in first step the same recognition method by looking
> for byte sequence magic at offset 0.
> 
> Instead of generic application/octet-stream mime type i show an user
> defined one. The file name suffix depends on sub classification.
> For single-file archives, often the last letter of the filename
> extension is changed to "$", but i also found samples where exclamation
> mark instead of dollar sign is used (like BUILD3.BM!). For multi-file
> archives, the most common extensions seem to be '.lib' and '.cmp',
> but is also found other names {like SAMPMIF$ (no file name suffix)
> OTDATA.$$$ TWOFILES.TSC (obviously abbreviation for tscomp) WIN.PAK
> (obviously an abbreviation for packed)}. Luckily the decompressing
> software deark can extract archive contents by command like:
> 	deark -m tscomp -d2 MAKERRES.DL$
> 
> I am no c-programmer, but when interpreting source right then in my
> "multi-file" samples the filename style value is 2, which means "with
> wildcards". For single samples the style is 1, which means no wildcard.
> Unfortunately i found no "old" examples with style value 0.
> 
> So the start with sub-classification with different suffix now looks like:
> 
> 0	string	\x65\x5d\x13\x8c\x08\x01\x03\x00 TSComp archive
> !:mime	application/x-tscomp-compressed
> >0x08	ubyte		0			data, filename style 0
> !:ext	??$
> #>0x08	ubyte		1			data, without wildcard
> >0x08	ubyte		1			data
> !:ext	??$/??!
> >0x08	ubyte		2			data, with wildcard
> !:ext	/lib/cmp/$$$/tsc/pak
> 
> When i understand the source right the original file name of first
> archive member (pascal string that is DOS 8.3 name), the DOS
> modification time stamp and the compressed size can be shown by lines like:
> >0x1c	pstring		x			\b, %s
> >0x16	lemsdosdate	x			\b, modified %s
> >0x18	lemsdostime	x			%s
> >0x0E	ulelong		x			\b, compressed size %u
> 
> If an archive contains more than one single file then it is possible to
> jump to next, second archive member fragment and show the file name of
> second archive member. So this now is done by lines like:
> >0x12	ulelong		>0
> >>(0x12.l+15)	pstring		x		\b, %s ...
> 
> This information can also be verified by running command line tool
> deark with line like:
> 	deark -m tscomp -l -d2 SAMPMML$
> 
> After applying the above mentioned modifications by patch
> file-5.45-archive-tscomp.diff then my samples are in principal
> described before, but now some details (like first archive member names
> and time stamps) are also shown. So this now looks like:
> 
> CRW3.LIB:     TSComp archive data,
> 	      with wildcard,
> 	      CRW.HLP, modified Sun, Jul 07 1993 02:00:02,
> 	      compressed size 642159
> Explore.lib:  TSComp archive data,
> 	      with wildcard,
> 	      MMATH194.EXE, modified Sun, Jan 24 1995 17:40:50,
> 	      compressed size 16020
> 	      , MMATH194.TXT ...
> HELP$:        TSComp archive data,
> 	      with wildcard,
> 	      BOOK.HLP, modified Sun, Apr 22 1992 17:56:04,
> 	      compressed size 6937
> 	      , CHAR.HLP ...
> INSTALL.EX$:  TSComp archive data,
> 	      INSTALL.EXE, modified Sun, Apr 22 1992 17:59:18,
> 	      compressed size 103271
> MAKERRES.DL$: TSComp archive data,
> 	      MAKERRES.DLL, modified Sun, Nov 17 1992 14:57:18,
> 	      compressed size 51753
> OTUPDATE.$$$: TSComp archive data,
> 	      with wildcard,
> 	      WOTRBLD.EXE, modified Sun, Jul 09 1991 11:53:28,
> 	      compressed size 6591
> 	      , WUPDLL.DLL ...
> PSP2.CMP:     TSComp archive data,
> 	      with wildcard,
> 	      PSP.DAT, modified Sun, Aug 14 1993 02:00:00,
> 	      compressed size 3364
> 	      , JMCAP.DLL ...
> SAMPMIF$:     TSComp archive data,
> 	      with wildcard,
> 	      TABLE.MIF, modified Sun, Apr 22 1992 17:55:48,
> 	      compressed size 856
> 	      , BARCHART.MIF ...
> SAMPMML$:     TSComp archive data,
> 	      with wildcard,
> 	      CHFORMAT.MML, modified Sun, Apr 22 1992 17:55:46,
> 	      compressed size 180
> 	      , FORMATS.MML ...
> TRANTUT$:     TSComp archive data,
> 	      with wildcard,
> 	      EARTHTOC.DOC, modified Sun, Oct 19 1992 16:11:16,
> 	      compressed size 4283
> 	      , RAINTEXT.DOC ...
> TWOFILES.TSC: TSComp archive data,
> 	      with wildcard,
> 	      A.TXT, modified Sun, May 05 2020 18:38:00,
> 	      compressed size 12
> 	      , B.TXT ...
> WIN.PAK:      TSComp archive data,
> 	      with wildcard,
> 	      SCIDLL.DLL, modified Sun, Nov 29 1993 08:43:48,
> 	      compressed size 50960
> 	      , SIERRAW.ICO ...
> 
> I hope my diff file can be applied in future version of
> file utility.
> 
> I use two test functions lemsdosdate and lemsdostime to interpret 2
> byte value as bit encoded date and time in DOS format relative to
> year 1980, but these functions are not mentioned in the official
> documentation magic.man. So i think these 2 functions should be
> mentioned there.
> 
> With best wishes,
> Jörg Jenderek
> --
> Jörg Jenderek
> <trid-tscomp-v.txt.gz><file-5_45-archive-tscomp_diff.DEFANGED-0><file-5_45-archive-tscomp_diff_sig.DEFANGED-1>-- 
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>



More information about the File mailing list