[File] [PATCH] of Magdir/archive, msdos, ibm6000, blit, digital TTComp archive

Jörg Jenderek joerg.jen.der.ek at gmx.net
Fri Jun 25 20:52:49 UTC 2021


Hello,

some days ago  i check my systems. When checking TTComp archives i
get unexpected results. When running running file command version
5.40 with -k option on such examples and related files i get an
output like:

ANIMATE.$XE:         TTComp archive, binary, 4K dictionary
ATIH_xx7.EXE:        TTComp archive, binary, 4K dictionary
		     DOS 2.0 backup id file, sequence 6
BACKUPID-16.@@@:     DOS 2.0 backup id file, sequence 4
BACKUPID_075.@@@:    DOS 2.0 backup id file, sequence 5
BACKUPID_xx6.@@@:    TTComp archive, binary, 4K dictionary
		     DOS 2.0 backup id file, sequence 6
BOMB.$XE:            TTComp archive, binary, 4K dictionary
LOTUS5.RAR:          Maple help database
		     DOS 2.0-3.2 backed up sequence 4 of file
		     \TMP\DIRECTOR.001\DIRECTOR.002\
		     DIRECTOR.003\LOTUS5.RAR
OVERVIEW.$XE:        TTComp archive, binary, 4K dictionary
		     (Lepton 3.x), scale 12099-56401,
		     spot sensor temperature
		     -244175795160866620000.000000,
		     color scheme 9, minimum point enabled,
		     calibration: offset -10080042.000000,
		     slope -0.000000
PBACKSCR.PI1:        TTComp archive, binary, 4K dictionary
quickgif.__d:        TTComp archive, binary, 4K dictionary
		     (Lepton 3.x), scale 19-61184,
		     spot sensor temperature 0.000000,
		     color scheme 1,
		     calibration: offset 0.000000, slope 0.000000
RLZRUN10.$TS:        TTComp archive, binary, 4K dictionary
SPHERE:              TTComp archive, binary, 4K dictionary
STTOOTH:             TTComp archive, binary, 4K dictionary
ttcomp-ascii-1k.bin: shared library
ttcomp-ascii-2k.bin: ctab data
		     locale data table
ttcomp-ascii-4k.bin: VAX-order 68k Blit mpx/mux executable
ttcomp-bin-1k.bin:   data
ttcomp-bin-2k.bin:   data
ttcomp-bin-4k.bin:   TTComp archive, binary, 4K dictionary
view:                68k Blit mpx/mux executable
		     VAX-order2 68k Blit mpx/mux executable

Furthermore with --extension only ??? is displayed and with -i option
only generic application/octet-stream is shown.

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). This recognizes  all
six TTComp archive variants as "TTComp archive compressed" by
definitions ark-ttcomp-*-?k.trid.xml. By that definitions all TTComp
archives described (See appended TTComp-trid-v.txt.gz ). It also
displays related URL.

The file command only recognize one variant. That is the "binary, 4K
dictionary". That is done by TrID by ark-ttcomp-bin-4k.trid.xml as
"TTComp archive compressed (bin-4K)".

So in current Magdir/archive that detection happens by lines like:
    0	string	\0\6
    >12	search/261	DESIGN
    >12	default		x	TTComp archive, binary, 4K dictionary

Unfortunately TTComp archive have no significant 4 byte magic pattern.
To overcome this weakness i put displaying part inside sub routine
ttcomp. Then it is possible to easily add additional test lines to
skip misidentified samples.

According to file formats archive team web site the sub routine looks
like:
  0	name	ttcomp
  >0	ubyte	x	TTComp archive data
  !:mime	application/x-compress-ttcomp
  !:ext	$xe/$ts/pi1/__d
  >0	ubyte	0	\b, binary
  >0	ubyte	1	\b, ASCII
  >1	ubyte	4	\b, 1K
  >1	ubyte	5	\b, 2K
  >1	ubyte	6	\b, 4K
  >1	ubyte	x	dictionary
  #>-3	ubyte		x	\b, last 3 bytes 0x%2.2x
  #>-2	ubeshort	x	\b%4.4x

The first byte indicated used compression type, where zero means
binary and one means ASCII type. The second byte specifies the size
of the dictionary, where value 4 means 1024 byte sized dictionary, 5
means 2048 dictionary size and value 6 stands for dictionary size
4096. Now a user defined mime type is shown and also mentioned file
name extensions like $xe are shown.
According to jsummers the last 3 bytes have only 8 possible bit
sequences (FFh 807Fh C03Fh E01Fh F00Fh F807h FC03h FE01h), but for
quickgif.__d with last 3 bytes 0A7DD4h this was not true. So maybe
this can also be used as additional test before calling sub routine
ttcomp.

All DOS 2.0 backup id file, sequence 6 examples like ATIH_xx7.EXE and
BACKUPID_xx6.@@@ start with the 2 byte sequence 00 06. This is also
the start magic for TTComp archive, binary, 4K dictionary. The backup
are handled by Magdir/msdos. According to page about BACKUP (MS-DOS)
there after some header bytes the remaining header bytes are
described as unknown, which is many cases nil bytes sequence. For
TTComp archive at that position is compressed part with high entropy.

So i skip backup examples by additional second test line checking for
nil byte sequence. This now becomes like:
    0	string	\0\6
    >12	search/261	DESIGN
    >12	default		x
    >>8	quad		!0
    >>>0	use	ttcomp

Now i add lines for other TTComp archive variants. The binary, 2K
variant is now described by additional lines like:
    0	string	\0\5
    >8	quad	!0
    >>0	use	ttcomp
Here again all misidentified DOS 2.0 backup id file with sequence 5
like example BACKUPID_075.@@@ are skipped by additional test line
before calling sub routine.

The ASCII, 1K variant is now described by additional lines like:
    0	string	\1\4
    !:strength	-2
    >0	use	ttcomp
All such examples are also described as "shared library" by lines
inside Magdir/ibm6000 like:
    0	beshort		0x0104		shared library
Unfortunately i have no knowledge about IBM RS/6000 machines and after
some hours of searching i give up. So i do not know what is more
characteristic for such shared library. So i reduced the pattern
strength of TTComp archive by 2 to value 48. So "shared library" with
strength pattern value 50 comes first and i get output like in
version 5.40. But when using keep going option -k then also
descriptions as TTComp archive appears. Furthermore i add inside
Magdir/ibm6000 a comment line with magic pattern conflict like:
   # GRR: line below is too general as it matches also
   # TTComp archive, ASCII, 1K handled by ./archive

The ASCII, 4K variant is now described by additional lines like:
   0	string	\1\6
   !:strength	-2
   >0	use	ttcomp
Unfortunately all such examples are also described as "VAX-order 68k
Blit mpx/mux executable" by lines inside Magdir/blit like:
   0	short	03001	VAX-order 68k Blit mpx/mux executable
Unfortunately i have no knowledge about blit and after some hours of
searching i give up. So i do not know what is more characteristic for
such Blit executable. So i reduced the pattern strength of TTComp
archive by 2 to value 48. With strength 49 i do net get wanted order.
I do not know why. So "Blit" with strength pattern value 50 comes
first and i get output like in version 5.40. But when using keep
going option -k then also descriptions as TTComp archive appears.
Furthermore i add inside Magdir/blit a comment line with magic
pattern conflict like:
   # GRR: line below is too general as it matches also
   # TTComp archive, ASCII, 4K handled by ./archive
I find an executable with name "view" that is described by
Magdir/blit as "VAX-order 68k Blit mpx/mux executable". In the middle
of this file are null terminated strings like putchar, fputs or main.
This seems to be function names. So maybe searching for such key
words can be used to distinguish blit from TTComp archive, but my
knowledge is too restricted to do this work.

The ASCII, 2K variant is now described by additional lines like:
  0	string	\1\5
  !:strength	-2
  >0	use	ttcomp
All such examples are also described as "locale data table" by lines
inside Magdir/digital like:
  0	short		0x0501		locale data table
Unfortunately i have no knowledge about such locale data and after
some time of searching i give up. So i do not know what is more
characteristic for such locale data. So i reduced the pattern
strength of TTComp archive by 2 to value 48. So "locale data" with
strength pattern value 50 comes first and i get output like in
version 5.40. But when using keep going option -k then also
descriptions as TTComp archive appears. Furthermore i add inside
Magdir/digital a comment line with magic pattern conflict like:
  # GRR: line below is too general as it matches also
  # TTComp archive, ASCII, 2K  handled by ./archive

After applying the above mentioned modifications by patches
file-5.40-archive-ttcomp.diff file-5.40-msdos-backup.diff
file-5.40-ibm6000-ttcomp.diff file-5.40-blit-ttcomp.diff
file-5.40-digital-ttcomp.diff then all TTComp archive variants are
recognized with -k option and some misidentifications vanished like:

ANIMATE.$XE:         TTComp archive data, binary, 4K dictionary
ATIH_xx7.EXE:        DOS 2.0 backup id file, sequence 6
BACKUPID-16.@@@:     DOS 2.0 backup id file, sequence 4
BACKUPID_075.@@@:    DOS 2.0 backup id file, sequence 5
BACKUPID_xx6.@@@:    DOS 2.0 backup id file, sequence 6
BOMB.$XE:            TTComp archive data, binary, 4K dictionary
LOTUS5.RAR:          Maple help database
		     DOS 2.0-3.2 backed up sequence 4 of file
		     \TMP\DIRECTOR.001\DIRECTOR.002\
		     DIRECTOR.003\LOTUS5.RAR
OVERVIEW.$XE:        TTComp archive data, binary, 4K dictionary
		     (Lepton 3.x), scale 12099-56401,
		     spot sensor temperature
		     -244175795160866620000.000000,
		     color scheme 9, minimum point enabled,
		     calibration: offset -10080042.000000,
                      slope -0.000000
PBACKSCR.PI1:        TTComp archive data, binary, 4K dictionary
quickgif.__d:        TTComp archive data, binary, 4K dictionary
		     (Lepton 3.x), scale 19-61184,
		     spot sensor temperature 0.000000,
		     color scheme 1,
		     calibration: offset 0.000000, slope 0.000000
RLZRUN10.$TS:        TTComp archive data, binary, 4K dictionary
SPHERE:              TTComp archive data, binary, 4K dictionary
STTOOTH:             TTComp archive data, binary, 4K dictionary
ttcomp-ascii-1k.bin: shared library
		     TTComp archive data, ASCII, 1K dictionary
ttcomp-ascii-2k.bin: ctab data
		     locale data table
		     TTComp archive data, ASCII, 2K dictionary
ttcomp-ascii-4k.bin: VAX-order 68k Blit mpx/mux executable
		     TTComp archive data, ASCII, 4K dictionary
ttcomp-bin-1k.bin:   TTComp archive data, binary, 1K dictionary
ttcomp-bin-2k.bin:   TTComp archive data, binary, 2K dictionary
ttcomp-bin-4k.bin:   TTComp archive data, binary, 4K dictionary
view:                68k Blit mpx/mux executable
		     VAX-order2 68k Blit mpx/mux executable


I hope my diff files can be applied in future version of file utility.

Furthermore many examples like OVERVIEW.$XE and quickgif.__d are
still are misidentified by sub routine diy-thermocam-checker inside
Magdir/measure as "(Lepton 3.x)". This sub routine gives too many
false hits.

With best wishes
Jörg Jenderek
--
Jörg Jenderek

-------------- next part --------------
--- file-5.40/magic/Magdir/msdos.old	2021-02-22 23:51:10 +0000
+++ file-5.40/magic/Magdir/msdos	2021-06-21 23:21:14 +0000
@@ -1615,5 +1615,6 @@
 
 # DOS backup 2.0 to 3.2
-
+# URL:		http://fileformats.archiveteam.org/wiki/BACKUP_(MS-DOS)
+# Reference:	http://www.ibiblio.org/pub/micro/pc-stuff/freedos/files/dos/restore/brtecdoc.htm
 # backupid.@@@
 
@@ -1625,4 +1626,5 @@
 >>>0x7	string	\0\0\0\0\0\0\0\0
 >>>>0x1 ubyte	x	DOS 2.0 backup id file, sequence %d
+#!:mime	application/octet-stream
 !:ext @@@
 >>>>0x0 ubyte	0xff	\b, last disk
@@ -1658,5 +1660,7 @@
 # full file name with path but without drive letter and colon stored from 0x05 til 0x52
 >>>>>>0x5	string	x	file %s
+#!:mime	application/octet-stream
 # backup name is original filename
+#!:ext	doc/exe/rar/zip
 #!:ext	*
 # magic/Magdir/msdos, 1169: Warning: EXTENSION type `     *' has bad char '*'
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.40-msdos-backup.diff.sig
Type: application/octet-stream
Size: 691 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20210625/aadc8443/attachment-0005.obj>
-------------- next part --------------
--- file-5.40/magic/Magdir/digital.old	2021-02-22 23:49:24 +0000
+++ file-5.40/magic/Magdir/digital	2021-06-22 19:08:17 +0000
@@ -54,4 +54,5 @@
 # Locale data tables (MIPS and Alpha).
 #
+# GRR: line below is too general as it matches also TTComp archive, ASCII, 2K  handled by ./archive
 0	short		0x0501		locale data table
 >6	short		0x24		for MIPS
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.40-digital-ttcomp.diff.sig
Type: application/octet-stream
Size: 404 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20210625/aadc8443/attachment-0006.obj>
-------------- next part --------------
--- file-5.40/magic/Magdir/blit.old	2021-02-22 23:49:24 +0000
+++ file-5.40/magic/Magdir/blit	2021-06-24 19:06:54 +0000
@@ -15,5 +15,9 @@
 0	long		0406		68k Blit mpx/mux executable
 0	short		0406		VAX-order2 68k Blit mpx/mux executable
+# GRR: line below is too general as it matches also TTComp archive, ASCII, 4K handled by ./archive
 0	short		03001		VAX-order 68k Blit mpx/mux executable
+# TODO:
+# skip TTComp archive, ASCII, 4K by looking for exectuable keyword like main
+#>0	search/5536	main\0		VAX-order 68k Blit mpx/mux executable
 # Need more values for WE32 DMD executables.
 # Note that 0520 is the same as COFF
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.40-blit-ttcomp.diff.sig
Type: application/octet-stream
Size: 515 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20210625/aadc8443/attachment-0007.obj>
-------------- next part --------------
--- file-5.40/magic/Magdir/ibm6000.old	2021-02-22 23:49:24 +0000
+++ file-5.40/magic/Magdir/ibm6000	2021-06-23 19:17:16 +0000
@@ -11,5 +11,7 @@
 #>28	belong		>0		not stripped
 #>6	beshort		>0		- version %ld
+# GRR: line below is too general as it matches also TTComp archive, ASCII, 1K handled by ./archive
 0	beshort		0x0104		shared library
+# GRR: line below is too general as it matches also TTComp archive, ASCII, 2K handled by ./archive
 0	beshort		0x0105		ctab data
 0	beshort		0xfe04		structured file
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.40-ibm6000-ttcomp.diff.sig
Type: application/octet-stream
Size: 438 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20210625/aadc8443/attachment-0008.obj>
-------------- next part --------------
--- file-5.40/magic/Magdir/archive.old	2021-02-22 23:49:24 +0000
+++ file-5.40/magic/Magdir/archive	2021-06-25 20:12:27 +0000
@@ -443,16 +443,81 @@
 # URL: http://fileformats.archiveteam.org/wiki/TTComp_archive
 # Update: Joerg Jenderek
 # GRR: line below is too general as it matches also Panorama database "TCDB 2003-10 demo.pan", others
 0	string	\0\6
 # look for first keyword of Panorama database *.pan
 >12	search/261	DESIGN
 # skip keyword with low entropy
->12	default		x	TTComp archive, binary, 4K dictionary
-# (version 5.25) labeled the above entry as "TTComp archive data"
+>12	default		x
+# skip DOS 2.0 backup id file, sequence 6 with many nils like BACKUPID_xx6.@@@ handled by ./msdos
+>>8	quad		!0
+>>>0	use	ttcomp
+# variant ASCII, 4K dictionary (strength=48=50-2). With strength=49 wrong order! WHY?
+0	string	\1\6
+# TODO:
+# skip VAX-order 68k Blit mpx/mux executable (strength=50) handled by ./blit
+!:strength	-2
+>0	use	ttcomp
+0	string	\0\5
+# skip some DOS 2.0 backup id file, sequence 5 with many nils like BACKUPID_075.@@@ handled by ./msdos
+>8	quad	!0
+>>0	use	ttcomp
+0	string	\1\5
+# TODO:
+# variant ASCII, 2K dictionary (strength=48=50-2). With strength=49 wrong order! WHY?
+# skip ctab data (strength=50) handled by ./ibm6000
+# skip locale data table (strength=50) handled by ./digital
+!:strength	-2
+>0	use	ttcomp
+0	string	\0\4
+# skip many Maple help database *.hdb with version tag handled by ./maple
+>1028	string	!version
+# skip veclib maple.hdb by looking for Mable keyword
+>>4	search/1091	Maple\040
+#>4	search/34090	Maple\040
+>>4	default		x
+# skip DOS 2.0-3.2 backed up sequence 4 with many nils like LOTUS5.RAR handled by ./msdos
+# skip xBASE Compound Index file *.CDX with many nils
+>>>0x54	quad		!0
+>>>>0	use	ttcomp
+0	string	\1\4
+# TODO:
+# skip Commodore PET BASIC 4.0 program *.prg
+# variant ASCII, 1K dictionary (strength=48=50-2). With strength=49 wrong order! WHY?
+# skip shared library (strength=50) handled by ./ibm6000
+!:strength	-2
+>0	use	ttcomp
+#	display information of TTComp archive
+0	name	ttcomp
+# (version 5.25) labeled the entry as "TTComp archive data"
+>0	ubyte	x	TTComp archive data
+!:mime	application/x-compress-ttcomp
+# PBACKSCR.PI1
+!:ext	$xe/$ts/pi1/__d
+# compression type: 0~binary compression 1~ASCII compression 
+>0	ubyte	0	\b, binary
+>0	ubyte	1	\b, ASCII
+# size of the dictionary:  4~1024 bytes 5~2048 bytes 6~4096 bytes 
+>1	ubyte	4	\b, 1K
+>1	ubyte	5	\b, 2K
+>1	ubyte	6	\b, 4K
+>1	ubyte	x	dictionary
+#	https://mark0.net/forum/index.php?topic=848
+# last 3 bytes probably have only 8 possible bit sequences
+# xxxxxxxx 0000000x 11111111	____FFh
+# xxxxxxxx 10000000 01111111	__807Fh	
+# 0xxxxxxx 11000000 00111111	__C03Fh
+# 00xxxxxx 11100000 00011111	__E01Fh
+# 000xxxxx 11110000 00001111	__F00Fh
+# 0000xxxx 11111000 00000111	__F807h
+# 00000xxx 11111100 00000011	__FC03h
+# 000000xx 11111110 00000001	__FE01h
+# but for quickgif.__d 0A7DD4h
+#>-3	ubyte		x	\b, last 3 bytes 0x%2.2x
+#>-2	ubeshort	x	\b%4.4x
 # From:		Joerg Jenderek
 # URL:		https://wiki.68kmla.org/DiskCopy_4.2_format_specification
 # reference:	http://nulib.com/library/FTN.e00005.htm
 0x52	ubeshort	0x0100
 # test for disk image size equal or above 400k
 >0x40	ubelong		>409599
 # test also for disk image size equal or below 1440k to skip
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.40-archive-ttcomp.diff.sig
Type: application/octet-stream
Size: 1611 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20210625/aadc8443/attachment-0009.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: TTComp-trid-v.txt.gz
Type: application/x-gzip
Size: 1258 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20210625/aadc8443/attachment-0001.bin>


More information about the File mailing list