[File] [PATCH] Magdir/archive TSComp archive ; extensions + details
Christos Zoulas
christos at zoulas.com
Sat Dec 2 13:50:42 UTC 2023
Committed, thanks!
christos
> On Nov 29, 2023, at 1:46 PM, Jörg Jenderek (GMX) <joerg.jen.der.ek at gmx.net> wrote:
>
> Hello,
>
> Some days ago i must look for some old software samples. Unfortunately
> these are packed in some compressed archives. So it took me some hours
> to find how to extract such archives and what are the content of my
> inspected archives.
>
> When running file command version 5.45 on such archive samples i get an
> output like:
>
> CRW3.LIB: TSComp archive data
> Explore.lib: TSComp archive data
> HELP$: TSComp archive data
> INSTALL.EX$: TSComp archive data
> MAKERRES.DL$: TSComp archive data
> OTUPDATE.$$$: TSComp archive data
> PSP2.CMP: TSComp archive data
> SAMPMIF$: TSComp archive data
> SAMPMML$: TSComp archive data
> TRANTUT$: TSComp archive data
> TWOFILES.TSC: TSComp archive data
> WIN.PAK: TSComp archive data
>
> With option --extension only ??? is displayed and with -i option generic
> application/octet-stream is shown.
>
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). When running TrID
> command on such examples these are described as "TSComp compressed data"
> by tscomp.trid.xml (See appended output/trid-tscomp-v.txt.gz).
>
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/). This
> does "recognize" the LIB archives. These are described as "Generic
> Library File" by PUID x-fmt/425. This detection happens based on
> unreliable file name suffix LIB.
>
> With the help of these tools i found pages about TSComp on web site file
> formats archive team. There also samples to download and unpacking
> software like deark are listed. That is now expressed inside
> Magdir/archive by comment lines like:
> # URL: http://fileformats.archiveteam.org/wiki/TSComp
> # Ref.: http://mark0.net/download/triddefs_xml.7z
> # defs/t/tscomp.trid.xml
> # https://entropymine.com/deark/releases/deark-1.6.5.tar.gz
> # deark-1.6.5/modules/installshld.c
>
> The detection happens by starting line inside Magdir/archive
> which looks like:
> 0 string \x65\x5d\x13\x8c\x08\x01\x03\x00 TSComp archive data
>
> All tools use in first step the same recognition method by looking
> for byte sequence magic at offset 0.
>
> Instead of generic application/octet-stream mime type i show an user
> defined one. The file name suffix depends on sub classification.
> For single-file archives, often the last letter of the filename
> extension is changed to "$", but i also found samples where exclamation
> mark instead of dollar sign is used (like BUILD3.BM!). For multi-file
> archives, the most common extensions seem to be '.lib' and '.cmp',
> but is also found other names {like SAMPMIF$ (no file name suffix)
> OTDATA.$$$ TWOFILES.TSC (obviously abbreviation for tscomp) WIN.PAK
> (obviously an abbreviation for packed)}. Luckily the decompressing
> software deark can extract archive contents by command like:
> deark -m tscomp -d2 MAKERRES.DL$
>
> I am no c-programmer, but when interpreting source right then in my
> "multi-file" samples the filename style value is 2, which means "with
> wildcards". For single samples the style is 1, which means no wildcard.
> Unfortunately i found no "old" examples with style value 0.
>
> So the start with sub-classification with different suffix now looks like:
>
> 0 string \x65\x5d\x13\x8c\x08\x01\x03\x00 TSComp archive
> !:mime application/x-tscomp-compressed
> >0x08 ubyte 0 data, filename style 0
> !:ext ??$
> #>0x08 ubyte 1 data, without wildcard
> >0x08 ubyte 1 data
> !:ext ??$/??!
> >0x08 ubyte 2 data, with wildcard
> !:ext /lib/cmp/$$$/tsc/pak
>
> When i understand the source right the original file name of first
> archive member (pascal string that is DOS 8.3 name), the DOS
> modification time stamp and the compressed size can be shown by lines like:
> >0x1c pstring x \b, %s
> >0x16 lemsdosdate x \b, modified %s
> >0x18 lemsdostime x %s
> >0x0E ulelong x \b, compressed size %u
>
> If an archive contains more than one single file then it is possible to
> jump to next, second archive member fragment and show the file name of
> second archive member. So this now is done by lines like:
> >0x12 ulelong >0
> >>(0x12.l+15) pstring x \b, %s ...
>
> This information can also be verified by running command line tool
> deark with line like:
> deark -m tscomp -l -d2 SAMPMML$
>
> After applying the above mentioned modifications by patch
> file-5.45-archive-tscomp.diff then my samples are in principal
> described before, but now some details (like first archive member names
> and time stamps) are also shown. So this now looks like:
>
> CRW3.LIB: TSComp archive data,
> with wildcard,
> CRW.HLP, modified Sun, Jul 07 1993 02:00:02,
> compressed size 642159
> Explore.lib: TSComp archive data,
> with wildcard,
> MMATH194.EXE, modified Sun, Jan 24 1995 17:40:50,
> compressed size 16020
> , MMATH194.TXT ...
> HELP$: TSComp archive data,
> with wildcard,
> BOOK.HLP, modified Sun, Apr 22 1992 17:56:04,
> compressed size 6937
> , CHAR.HLP ...
> INSTALL.EX$: TSComp archive data,
> INSTALL.EXE, modified Sun, Apr 22 1992 17:59:18,
> compressed size 103271
> MAKERRES.DL$: TSComp archive data,
> MAKERRES.DLL, modified Sun, Nov 17 1992 14:57:18,
> compressed size 51753
> OTUPDATE.$$$: TSComp archive data,
> with wildcard,
> WOTRBLD.EXE, modified Sun, Jul 09 1991 11:53:28,
> compressed size 6591
> , WUPDLL.DLL ...
> PSP2.CMP: TSComp archive data,
> with wildcard,
> PSP.DAT, modified Sun, Aug 14 1993 02:00:00,
> compressed size 3364
> , JMCAP.DLL ...
> SAMPMIF$: TSComp archive data,
> with wildcard,
> TABLE.MIF, modified Sun, Apr 22 1992 17:55:48,
> compressed size 856
> , BARCHART.MIF ...
> SAMPMML$: TSComp archive data,
> with wildcard,
> CHFORMAT.MML, modified Sun, Apr 22 1992 17:55:46,
> compressed size 180
> , FORMATS.MML ...
> TRANTUT$: TSComp archive data,
> with wildcard,
> EARTHTOC.DOC, modified Sun, Oct 19 1992 16:11:16,
> compressed size 4283
> , RAINTEXT.DOC ...
> TWOFILES.TSC: TSComp archive data,
> with wildcard,
> A.TXT, modified Sun, May 05 2020 18:38:00,
> compressed size 12
> , B.TXT ...
> WIN.PAK: TSComp archive data,
> with wildcard,
> SCIDLL.DLL, modified Sun, Nov 29 1993 08:43:48,
> compressed size 50960
> , SIERRAW.ICO ...
>
> I hope my diff file can be applied in future version of
> file utility.
>
> I use two test functions lemsdosdate and lemsdostime to interpret 2
> byte value as bit encoded date and time in DOS format relative to
> year 1980, but these functions are not mentioned in the official
> documentation magic.man. So i think these 2 functions should be
> mentioned there.
>
> With best wishes,
> Jörg Jenderek
> --
> Jörg Jenderek
> <trid-tscomp-v.txt.gz><file-5_45-archive-tscomp_diff.DEFANGED-0><file-5_45-archive-tscomp_diff_sig.DEFANGED-1>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>
More information about the File
mailing list