[File] [PATCH] of Magdir/archive, msdos, ibm6000, blit, digital TTComp archive
Jörg Jenderek
joerg.jen.der.ek at gmx.net
Fri Jun 25 20:52:49 UTC 2021
Hello,
some days ago i check my systems. When checking TTComp archives i
get unexpected results. When running running file command version
5.40 with -k option on such examples and related files i get an
output like:
ANIMATE.$XE: TTComp archive, binary, 4K dictionary
ATIH_xx7.EXE: TTComp archive, binary, 4K dictionary
DOS 2.0 backup id file, sequence 6
BACKUPID-16.@@@: DOS 2.0 backup id file, sequence 4
BACKUPID_075.@@@: DOS 2.0 backup id file, sequence 5
BACKUPID_xx6.@@@: TTComp archive, binary, 4K dictionary
DOS 2.0 backup id file, sequence 6
BOMB.$XE: TTComp archive, binary, 4K dictionary
LOTUS5.RAR: Maple help database
DOS 2.0-3.2 backed up sequence 4 of file
\TMP\DIRECTOR.001\DIRECTOR.002\
DIRECTOR.003\LOTUS5.RAR
OVERVIEW.$XE: TTComp archive, binary, 4K dictionary
(Lepton 3.x), scale 12099-56401,
spot sensor temperature
-244175795160866620000.000000,
color scheme 9, minimum point enabled,
calibration: offset -10080042.000000,
slope -0.000000
PBACKSCR.PI1: TTComp archive, binary, 4K dictionary
quickgif.__d: TTComp archive, binary, 4K dictionary
(Lepton 3.x), scale 19-61184,
spot sensor temperature 0.000000,
color scheme 1,
calibration: offset 0.000000, slope 0.000000
RLZRUN10.$TS: TTComp archive, binary, 4K dictionary
SPHERE: TTComp archive, binary, 4K dictionary
STTOOTH: TTComp archive, binary, 4K dictionary
ttcomp-ascii-1k.bin: shared library
ttcomp-ascii-2k.bin: ctab data
locale data table
ttcomp-ascii-4k.bin: VAX-order 68k Blit mpx/mux executable
ttcomp-bin-1k.bin: data
ttcomp-bin-2k.bin: data
ttcomp-bin-4k.bin: TTComp archive, binary, 4K dictionary
view: 68k Blit mpx/mux executable
VAX-order2 68k Blit mpx/mux executable
Furthermore with --extension only ??? is displayed and with -i option
only generic application/octet-stream is shown.
For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). This recognizes all
six TTComp archive variants as "TTComp archive compressed" by
definitions ark-ttcomp-*-?k.trid.xml. By that definitions all TTComp
archives described (See appended TTComp-trid-v.txt.gz ). It also
displays related URL.
The file command only recognize one variant. That is the "binary, 4K
dictionary". That is done by TrID by ark-ttcomp-bin-4k.trid.xml as
"TTComp archive compressed (bin-4K)".
So in current Magdir/archive that detection happens by lines like:
0 string \0\6
>12 search/261 DESIGN
>12 default x TTComp archive, binary, 4K dictionary
Unfortunately TTComp archive have no significant 4 byte magic pattern.
To overcome this weakness i put displaying part inside sub routine
ttcomp. Then it is possible to easily add additional test lines to
skip misidentified samples.
According to file formats archive team web site the sub routine looks
like:
0 name ttcomp
>0 ubyte x TTComp archive data
!:mime application/x-compress-ttcomp
!:ext $xe/$ts/pi1/__d
>0 ubyte 0 \b, binary
>0 ubyte 1 \b, ASCII
>1 ubyte 4 \b, 1K
>1 ubyte 5 \b, 2K
>1 ubyte 6 \b, 4K
>1 ubyte x dictionary
#>-3 ubyte x \b, last 3 bytes 0x%2.2x
#>-2 ubeshort x \b%4.4x
The first byte indicated used compression type, where zero means
binary and one means ASCII type. The second byte specifies the size
of the dictionary, where value 4 means 1024 byte sized dictionary, 5
means 2048 dictionary size and value 6 stands for dictionary size
4096. Now a user defined mime type is shown and also mentioned file
name extensions like $xe are shown.
According to jsummers the last 3 bytes have only 8 possible bit
sequences (FFh 807Fh C03Fh E01Fh F00Fh F807h FC03h FE01h), but for
quickgif.__d with last 3 bytes 0A7DD4h this was not true. So maybe
this can also be used as additional test before calling sub routine
ttcomp.
All DOS 2.0 backup id file, sequence 6 examples like ATIH_xx7.EXE and
BACKUPID_xx6.@@@ start with the 2 byte sequence 00 06. This is also
the start magic for TTComp archive, binary, 4K dictionary. The backup
are handled by Magdir/msdos. According to page about BACKUP (MS-DOS)
there after some header bytes the remaining header bytes are
described as unknown, which is many cases nil bytes sequence. For
TTComp archive at that position is compressed part with high entropy.
So i skip backup examples by additional second test line checking for
nil byte sequence. This now becomes like:
0 string \0\6
>12 search/261 DESIGN
>12 default x
>>8 quad !0
>>>0 use ttcomp
Now i add lines for other TTComp archive variants. The binary, 2K
variant is now described by additional lines like:
0 string \0\5
>8 quad !0
>>0 use ttcomp
Here again all misidentified DOS 2.0 backup id file with sequence 5
like example BACKUPID_075.@@@ are skipped by additional test line
before calling sub routine.
The ASCII, 1K variant is now described by additional lines like:
0 string \1\4
!:strength -2
>0 use ttcomp
All such examples are also described as "shared library" by lines
inside Magdir/ibm6000 like:
0 beshort 0x0104 shared library
Unfortunately i have no knowledge about IBM RS/6000 machines and after
some hours of searching i give up. So i do not know what is more
characteristic for such shared library. So i reduced the pattern
strength of TTComp archive by 2 to value 48. So "shared library" with
strength pattern value 50 comes first and i get output like in
version 5.40. But when using keep going option -k then also
descriptions as TTComp archive appears. Furthermore i add inside
Magdir/ibm6000 a comment line with magic pattern conflict like:
# GRR: line below is too general as it matches also
# TTComp archive, ASCII, 1K handled by ./archive
The ASCII, 4K variant is now described by additional lines like:
0 string \1\6
!:strength -2
>0 use ttcomp
Unfortunately all such examples are also described as "VAX-order 68k
Blit mpx/mux executable" by lines inside Magdir/blit like:
0 short 03001 VAX-order 68k Blit mpx/mux executable
Unfortunately i have no knowledge about blit and after some hours of
searching i give up. So i do not know what is more characteristic for
such Blit executable. So i reduced the pattern strength of TTComp
archive by 2 to value 48. With strength 49 i do net get wanted order.
I do not know why. So "Blit" with strength pattern value 50 comes
first and i get output like in version 5.40. But when using keep
going option -k then also descriptions as TTComp archive appears.
Furthermore i add inside Magdir/blit a comment line with magic
pattern conflict like:
# GRR: line below is too general as it matches also
# TTComp archive, ASCII, 4K handled by ./archive
I find an executable with name "view" that is described by
Magdir/blit as "VAX-order 68k Blit mpx/mux executable". In the middle
of this file are null terminated strings like putchar, fputs or main.
This seems to be function names. So maybe searching for such key
words can be used to distinguish blit from TTComp archive, but my
knowledge is too restricted to do this work.
The ASCII, 2K variant is now described by additional lines like:
0 string \1\5
!:strength -2
>0 use ttcomp
All such examples are also described as "locale data table" by lines
inside Magdir/digital like:
0 short 0x0501 locale data table
Unfortunately i have no knowledge about such locale data and after
some time of searching i give up. So i do not know what is more
characteristic for such locale data. So i reduced the pattern
strength of TTComp archive by 2 to value 48. So "locale data" with
strength pattern value 50 comes first and i get output like in
version 5.40. But when using keep going option -k then also
descriptions as TTComp archive appears. Furthermore i add inside
Magdir/digital a comment line with magic pattern conflict like:
# GRR: line below is too general as it matches also
# TTComp archive, ASCII, 2K handled by ./archive
After applying the above mentioned modifications by patches
file-5.40-archive-ttcomp.diff file-5.40-msdos-backup.diff
file-5.40-ibm6000-ttcomp.diff file-5.40-blit-ttcomp.diff
file-5.40-digital-ttcomp.diff then all TTComp archive variants are
recognized with -k option and some misidentifications vanished like:
ANIMATE.$XE: TTComp archive data, binary, 4K dictionary
ATIH_xx7.EXE: DOS 2.0 backup id file, sequence 6
BACKUPID-16.@@@: DOS 2.0 backup id file, sequence 4
BACKUPID_075.@@@: DOS 2.0 backup id file, sequence 5
BACKUPID_xx6.@@@: DOS 2.0 backup id file, sequence 6
BOMB.$XE: TTComp archive data, binary, 4K dictionary
LOTUS5.RAR: Maple help database
DOS 2.0-3.2 backed up sequence 4 of file
\TMP\DIRECTOR.001\DIRECTOR.002\
DIRECTOR.003\LOTUS5.RAR
OVERVIEW.$XE: TTComp archive data, binary, 4K dictionary
(Lepton 3.x), scale 12099-56401,
spot sensor temperature
-244175795160866620000.000000,
color scheme 9, minimum point enabled,
calibration: offset -10080042.000000,
slope -0.000000
PBACKSCR.PI1: TTComp archive data, binary, 4K dictionary
quickgif.__d: TTComp archive data, binary, 4K dictionary
(Lepton 3.x), scale 19-61184,
spot sensor temperature 0.000000,
color scheme 1,
calibration: offset 0.000000, slope 0.000000
RLZRUN10.$TS: TTComp archive data, binary, 4K dictionary
SPHERE: TTComp archive data, binary, 4K dictionary
STTOOTH: TTComp archive data, binary, 4K dictionary
ttcomp-ascii-1k.bin: shared library
TTComp archive data, ASCII, 1K dictionary
ttcomp-ascii-2k.bin: ctab data
locale data table
TTComp archive data, ASCII, 2K dictionary
ttcomp-ascii-4k.bin: VAX-order 68k Blit mpx/mux executable
TTComp archive data, ASCII, 4K dictionary
ttcomp-bin-1k.bin: TTComp archive data, binary, 1K dictionary
ttcomp-bin-2k.bin: TTComp archive data, binary, 2K dictionary
ttcomp-bin-4k.bin: TTComp archive data, binary, 4K dictionary
view: 68k Blit mpx/mux executable
VAX-order2 68k Blit mpx/mux executable
I hope my diff files can be applied in future version of file utility.
Furthermore many examples like OVERVIEW.$XE and quickgif.__d are
still are misidentified by sub routine diy-thermocam-checker inside
Magdir/measure as "(Lepton 3.x)". This sub routine gives too many
false hits.
With best wishes
Jörg Jenderek
--
Jörg Jenderek
-------------- next part --------------
--- file-5.40/magic/Magdir/msdos.old 2021-02-22 23:51:10 +0000
+++ file-5.40/magic/Magdir/msdos 2021-06-21 23:21:14 +0000
@@ -1615,5 +1615,6 @@
# DOS backup 2.0 to 3.2
-
+# URL: http://fileformats.archiveteam.org/wiki/BACKUP_(MS-DOS)
+# Reference: http://www.ibiblio.org/pub/micro/pc-stuff/freedos/files/dos/restore/brtecdoc.htm
# backupid.@@@
@@ -1625,4 +1626,5 @@
>>>0x7 string \0\0\0\0\0\0\0\0
>>>>0x1 ubyte x DOS 2.0 backup id file, sequence %d
+#!:mime application/octet-stream
!:ext @@@
>>>>0x0 ubyte 0xff \b, last disk
@@ -1658,5 +1660,7 @@
# full file name with path but without drive letter and colon stored from 0x05 til 0x52
>>>>>>0x5 string x file %s
+#!:mime application/octet-stream
# backup name is original filename
+#!:ext doc/exe/rar/zip
#!:ext *
# magic/Magdir/msdos, 1169: Warning: EXTENSION type ` *' has bad char '*'
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.40-msdos-backup.diff.sig
Type: application/octet-stream
Size: 691 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20210625/aadc8443/attachment-0005.obj>
-------------- next part --------------
--- file-5.40/magic/Magdir/digital.old 2021-02-22 23:49:24 +0000
+++ file-5.40/magic/Magdir/digital 2021-06-22 19:08:17 +0000
@@ -54,4 +54,5 @@
# Locale data tables (MIPS and Alpha).
#
+# GRR: line below is too general as it matches also TTComp archive, ASCII, 2K handled by ./archive
0 short 0x0501 locale data table
>6 short 0x24 for MIPS
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.40-digital-ttcomp.diff.sig
Type: application/octet-stream
Size: 404 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20210625/aadc8443/attachment-0006.obj>
-------------- next part --------------
--- file-5.40/magic/Magdir/blit.old 2021-02-22 23:49:24 +0000
+++ file-5.40/magic/Magdir/blit 2021-06-24 19:06:54 +0000
@@ -15,5 +15,9 @@
0 long 0406 68k Blit mpx/mux executable
0 short 0406 VAX-order2 68k Blit mpx/mux executable
+# GRR: line below is too general as it matches also TTComp archive, ASCII, 4K handled by ./archive
0 short 03001 VAX-order 68k Blit mpx/mux executable
+# TODO:
+# skip TTComp archive, ASCII, 4K by looking for exectuable keyword like main
+#>0 search/5536 main\0 VAX-order 68k Blit mpx/mux executable
# Need more values for WE32 DMD executables.
# Note that 0520 is the same as COFF
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.40-blit-ttcomp.diff.sig
Type: application/octet-stream
Size: 515 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20210625/aadc8443/attachment-0007.obj>
-------------- next part --------------
--- file-5.40/magic/Magdir/ibm6000.old 2021-02-22 23:49:24 +0000
+++ file-5.40/magic/Magdir/ibm6000 2021-06-23 19:17:16 +0000
@@ -11,5 +11,7 @@
#>28 belong >0 not stripped
#>6 beshort >0 - version %ld
+# GRR: line below is too general as it matches also TTComp archive, ASCII, 1K handled by ./archive
0 beshort 0x0104 shared library
+# GRR: line below is too general as it matches also TTComp archive, ASCII, 2K handled by ./archive
0 beshort 0x0105 ctab data
0 beshort 0xfe04 structured file
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.40-ibm6000-ttcomp.diff.sig
Type: application/octet-stream
Size: 438 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20210625/aadc8443/attachment-0008.obj>
-------------- next part --------------
--- file-5.40/magic/Magdir/archive.old 2021-02-22 23:49:24 +0000
+++ file-5.40/magic/Magdir/archive 2021-06-25 20:12:27 +0000
@@ -443,16 +443,81 @@
# URL: http://fileformats.archiveteam.org/wiki/TTComp_archive
# Update: Joerg Jenderek
# GRR: line below is too general as it matches also Panorama database "TCDB 2003-10 demo.pan", others
0 string \0\6
# look for first keyword of Panorama database *.pan
>12 search/261 DESIGN
# skip keyword with low entropy
->12 default x TTComp archive, binary, 4K dictionary
-# (version 5.25) labeled the above entry as "TTComp archive data"
+>12 default x
+# skip DOS 2.0 backup id file, sequence 6 with many nils like BACKUPID_xx6.@@@ handled by ./msdos
+>>8 quad !0
+>>>0 use ttcomp
+# variant ASCII, 4K dictionary (strength=48=50-2). With strength=49 wrong order! WHY?
+0 string \1\6
+# TODO:
+# skip VAX-order 68k Blit mpx/mux executable (strength=50) handled by ./blit
+!:strength -2
+>0 use ttcomp
+0 string \0\5
+# skip some DOS 2.0 backup id file, sequence 5 with many nils like BACKUPID_075.@@@ handled by ./msdos
+>8 quad !0
+>>0 use ttcomp
+0 string \1\5
+# TODO:
+# variant ASCII, 2K dictionary (strength=48=50-2). With strength=49 wrong order! WHY?
+# skip ctab data (strength=50) handled by ./ibm6000
+# skip locale data table (strength=50) handled by ./digital
+!:strength -2
+>0 use ttcomp
+0 string \0\4
+# skip many Maple help database *.hdb with version tag handled by ./maple
+>1028 string !version
+# skip veclib maple.hdb by looking for Mable keyword
+>>4 search/1091 Maple\040
+#>4 search/34090 Maple\040
+>>4 default x
+# skip DOS 2.0-3.2 backed up sequence 4 with many nils like LOTUS5.RAR handled by ./msdos
+# skip xBASE Compound Index file *.CDX with many nils
+>>>0x54 quad !0
+>>>>0 use ttcomp
+0 string \1\4
+# TODO:
+# skip Commodore PET BASIC 4.0 program *.prg
+# variant ASCII, 1K dictionary (strength=48=50-2). With strength=49 wrong order! WHY?
+# skip shared library (strength=50) handled by ./ibm6000
+!:strength -2
+>0 use ttcomp
+# display information of TTComp archive
+0 name ttcomp
+# (version 5.25) labeled the entry as "TTComp archive data"
+>0 ubyte x TTComp archive data
+!:mime application/x-compress-ttcomp
+# PBACKSCR.PI1
+!:ext $xe/$ts/pi1/__d
+# compression type: 0~binary compression 1~ASCII compression
+>0 ubyte 0 \b, binary
+>0 ubyte 1 \b, ASCII
+# size of the dictionary: 4~1024 bytes 5~2048 bytes 6~4096 bytes
+>1 ubyte 4 \b, 1K
+>1 ubyte 5 \b, 2K
+>1 ubyte 6 \b, 4K
+>1 ubyte x dictionary
+# https://mark0.net/forum/index.php?topic=848
+# last 3 bytes probably have only 8 possible bit sequences
+# xxxxxxxx 0000000x 11111111 ____FFh
+# xxxxxxxx 10000000 01111111 __807Fh
+# 0xxxxxxx 11000000 00111111 __C03Fh
+# 00xxxxxx 11100000 00011111 __E01Fh
+# 000xxxxx 11110000 00001111 __F00Fh
+# 0000xxxx 11111000 00000111 __F807h
+# 00000xxx 11111100 00000011 __FC03h
+# 000000xx 11111110 00000001 __FE01h
+# but for quickgif.__d 0A7DD4h
+#>-3 ubyte x \b, last 3 bytes 0x%2.2x
+#>-2 ubeshort x \b%4.4x
# From: Joerg Jenderek
# URL: https://wiki.68kmla.org/DiskCopy_4.2_format_specification
# reference: http://nulib.com/library/FTN.e00005.htm
0x52 ubeshort 0x0100
# test for disk image size equal or above 400k
>0x40 ubelong >409599
# test also for disk image size equal or below 1440k to skip
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.40-archive-ttcomp.diff.sig
Type: application/octet-stream
Size: 1611 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20210625/aadc8443/attachment-0009.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: TTComp-trid-v.txt.gz
Type: application/x-gzip
Size: 1258 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20210625/aadc8443/attachment-0001.bin>
More information about the File
mailing list