[File] [PATCH] of Magdir/spectrum Windows cache *.db misidetified as Spectrum .TAP

Jörg Jenderek joerg.jen.der.ek at gmx.net
Thu May 4 20:58:08 UTC 2023


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,
some days ago i handles some database. Often the suffix db is used
for such file names. Some samples are misidentified as "Spectrum .TAP
data".

When running file command version 5.44 on such real spectrum tape
examples and misidentified db samples, i get an output like:

1943 (-).TAP:
	Spectrum .TAP data "  1943    " - BASIC program
Cauldron II (S).cdt:
	Spectrum .TZX data version 1.10
Count Duckula (E).cdt:
	Spectrum .TZX data version 1.10
EXAMPLES.TAP:
	Spectrum .TAP data "screen    " - memory block (screen)
TFCOPY2.TAP:
	Spectrum .TAP data "TF COPY II" - BASIC program
Tape-FileCopy(MartinMoracek)(SuperII).tap:
	Spectrum .TAP data "\023\001TF" - BASIC program
Treachery.tzx:
	Spectrum .TZX data version 1.13
fmt-801-signature-id-1166.tap:
	data
{85CEE8D6-0F90-4492-B484-98E38862B28D}.2.ver0x0000000000000004.db:
	Spectrum .TAP data ")\335\242\" - BASIC program
{DDF571F2-BE98-426D-8288-1A9A39C3FDA2}.2.ver0x0000000000000004.db:
	Spectrum .TAP data ")\335\242\" - BASIC program

With --extension option only ??? is displayed. Furthermore with
- -i option for samples only generic application/octet-stream is shown.

For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/).
The TAP samples are described as "TAP (ZX Spectrum)" without mime by
PUID fmt/801. The other tape samples are described as "TZX Format"
without mime type by PUID fmt/1000. The TZX suffix is considered as
valid whereas the CDT suffix is considered as bad (EXTENSION_MISMATCH
true). The DB samples are described as "Thumbs DB file" by PUID
fmt/682	via extension (See appended droid-tape.csv.gz).

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). The TAP samples are
described as here as "ZX Spectrum Tape image" by tap-zx.trid.xml.
Also the DROID sample fmt-801-signature-id-1166.tap is described in
that way. The other tape samples are described as "ZX Spectrum Tape
image" with mime type application/x-spectrum-tzx by tzx.trid.xml. The
DB samples are here described as "Unknown!" (See appended
output/trid-v-tape.txt.gz).

TrID list the used file name extension and often with -v option the
related URL pointing to used file format information. With the help
of other tools i found a page about newer TZX tape on file formats
archive team web site. This is now expressed inside Magdir/spectrum
by additional comment lines like:
# URL:		http://fileformats.archiveteam.org/wiki/TZX
# Reference:	https://worldofspectrum.net/TZXformat.html
#		http://mark0.net/download/triddefs_xml.7z
#		defs/t/tzx.trid.xml

The description happens inside Magdir/spectrum by lines like
 0      string          ZXTape!\x1a     Spectrum .TZX data
 >8     byte            x               version %d
 >9     byte            x               \b.%d
Instead of generic application/octet-stream mime type i show type
used by TrID. The standard file name suffix is TZX. CDT suffix is
used for Amstrad tapes, which have the same format. So this
information is shown by adapted lines. This now looks like:
 0      string          ZXTape!\x1a     Spectrum .TZX data
 !:mime	application/x-spectrum-tzx
 !:ext	tzx/cdt
 >8     byte            x               version %d
 >9     byte            x               \b.%d

When you are inspecting hundreds of such tape samples you are happy
when you get additional information to distinguish the samples.
After the start the ID of first block is block is stored. So show
this value and for a few cases also in human readable form (like
pause text etc.). So for different ID show then this additional
information. For pause id (0x20) this is the duration in
milliseconds. For text id (0x30) this is a pascal string. So
according to documentation this additional information is shown by
additional lines like:
 >10	ubyte		x		\b; ID %#x
 >10	ubyte		=0x20		(pause)
 >>11	uleshort	x		%u ms
 >10	ubyte		=0x30		(text)
 >>11	pstring		x		"%s"
That information be verified by fuse-emulator-utils via command
line like:
	tzxlist EXAMPLES.TAP

With the help of other tools i found a page about the older TAP (ZX
Spectrum) tape on file formats archive team web site. This is now
expressed inside Magdir/spectrum by additional comment lines like:
# URL:		http://fileformats.archiveteam.org/wiki/
#		TAP_(ZX_Spectrum)
# Reference:	http://web.archive.org/web/20110711141601/
#		http://www.zxmodules.de/fileformats/tapformat.html
#		http://mark0.net/download/triddefs_xml.7z
#		defs/t/tap-zx.trid.xml

The description start inside Magdir/spectrum with lines like:
 0       string          \023\000\000
 >4      string          >\0
 >>4     string          <\177          Spectrum .TAP data "%-10.10s"
The first test look for starting 3 byte "magic" like the other tools.
By the other test lines sanity-check of string are done to check if
name is printable. This must be done carefully, because names are
not always "nice" ASCII like "TF COPY II", "screen    ", "  1943    "
or "\023\001TF" in Martin Moracek example. So i could not use
stricter checks here, but by third test line DROID sample
fmt-801-signature-id-1166.tap of DROID with invalid name
\253\253\253\253\253\253\253\253\253\253 is skipped. These tests
are only used by file command tool. So i look what the other tool
are using. These check value of byte at offset 23. I do not
understand why this works and if this is always true, but both
other tools use this method and in the end i found not other way.
So i apply also this method and so the Windows Caches db samples (
found inside
c:\ProgramData\Microsoft\Windows\Caches) are skipped. Analogue to
newer tzx format i choose a similar user defined mime type. So the
start now looks like:
 0       string          \023\000\000
 >4      string          >\0
 >>23	ubyte		=0xFF
 >>>4     string          <\177     Spectrum .TAP data "%-10.10s"
 !:mime	application/x-spectrum-tap
 !:ext	tap

There exist lines to do sub classification depending on data type
byte. That looked like
 >>>3    byte            0               - BASIC program
 >>>3    byte            1               - number array
 >>>3    byte            2               - character array
 >>>3    byte            3               - memory block
 >>>>14  belong          0x001B0040      (screen)
For memory block in case of a SCREEN$ header the length of the
following data is 1B00h=6912 and start address is 4000h=16384. That
was shown by last line. At the end i also show length of the
following data after the header and checksum byte (simply all bytes
including flag byte XORed). This is done by additional lines like:
 >>>>14		uleshort	x	\b, data length %u
 #>>>>20	ubyte		x	\b, checksum %#x
With this information i tried to inspect the next block, but i get
values that are not reasonable for me. So i could not use these
facts as additional test and in the end i use method of the other
tools. The misidentified db samples are described as BASIC program.
So i add lines to show more information at this point for this
case. So i show
auto start line. According to documentarian values 0 until 9999
are valid and value 32768 means "no auto-loading". I also show
length of BASIC program. So that branch now becomes like:
 >>>>3	byte		0	- BASIC program
 >>>>>16	uleshort	x	\b, autostart line %u
 >>>>>18	uleshort	x	\b, program length %u

After applying the above mentioned modifications by patch
file-5.44-spectrum-tape.diff then misidentification vanish and i
get a more detailed output like:
1943 (-).TAP:
	Spectrum .TAP data "  1943    " - BASIC program
	, autostart line 1, program length 335
	, data length 335
Cauldron II (S).cdt:
	Spectrum .TZX data version 1.10
	; ID 0x32 (archive info)
	, 0x9e bytes with 7 (type) text parts
	(0) CAULDRON II (1) PALACE SOFTWARE / ERBE
	(3) 1986 (4) SPANISH! (5) GAME
	(8) ORIGINAL TAPE SPANISH VERSION
	(-1) D.L. M-21936-1986. TZXed by johnny farragut
	(deepfb2002 at yahoo.es)
Count Duckula (E).cdt:
	Spectrum .TZX data version 1.10
	; ID 0x11 (turbo)
	, 4096 pilot pulses with 2337 tstates
	, 1575 and 1103 sync tstates
	, 1195 zero tstates, 2388 one tstates
	, use 1 bit, 15 ms pause, 264 data bytes
EXAMPLES.TAP:
	Spectrum .TAP data "screen    " - memory block (screen)
	, data length 6912
TFCOPY2.TAP:
	Spectrum .TAP data "TF COPY II" - BASIC program
	, autostart line 10, program length 1505
	, data length 1505
Tape-FileCopy(MartinMoracek)(SuperII).tap:
	Spectrum .TAP data "\023\001TF" - BASIC program
	, autostart line 1, program length 4319
	, data length 4401
Treachery.tzx:
	Spectrum .TZX data version 1.13
	; ID 0x30 (text)
	"Created by Spectaculator"
fmt-801-signature-id-1166.tap:
	data
{85CEE8D6-0F90-4492-B484-98E38862B28D}.2.ver0x0000000000000004.db:
	data
{DDF571F2-BE98-426D-8288-1A9A39C3FDA2}.2.ver0x0000000000000004.db:
	data

I hope my diff file can be applied in future version of
file utility.

There is something to do. Classify the mysterious Windows cache db
samples.

With best wishes
Jörg Jenderek
- --
Jörg Jenderek




-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCZFQcYAAKCRCv8rHJQhrU
1n2dAKCrjzA/LU168uuQ65E4wc+toXKJUACffASl0tXWsjz4qKAQEnQ7P37OJp8=
=JuRv
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: droid-tape.csv.gz
Type: application/x-gzip
Size: 808 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230504/b50dd535/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-tape.txt.gz
Type: application/x-gzip
Size: 628 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230504/b50dd535/attachment-0001.bin>
-------------- next part --------------
--- file-5.44/magic/Magdir/spectrum.old	2021-05-12 19:30:24.000000000 +0200
+++ file-5.44/magic/Magdir/spectrum	2023-05-04 22:32:39.797949800 +0200
@@ -24,11 +24,38 @@
 #  -Adam Buchbinder <adam.buchbinder at gmail.com>
+# Update:	Joerg Jenderek 2023 May
+# URL:		http://fileformats.archiveteam.org/wiki/TAP_(ZX_Spectrum)
+# Reference:	http://web.archive.org/web/20110711141601/http://www.zxmodules.de/fileformats/tapformat.html
+#		http://mark0.net/download/triddefs_xml.7z/defs/t/tap-zx.trid.xml
+# Note:		called "ZX Spectrum Tape image" by TrID and "TAP (ZX Spectrum)" by DROID via PUID fmt/801
+#		verified by fuse-emulator-utils `tzxlist EXAMPLES.TAP`
 #
+# headers length 19=023 and flag byte 0 indicating a standard ROM loading header 
 0       string          \023\000\000
 >4      string          >\0
->>4     string          <\177           Spectrum .TAP data "%-10.10s"
->>>3    byte            0               - BASIC program
->>>3    byte            1               - number array
->>>3    byte            2               - character array
->>>3    byte            3               - memory block
->>>>14  belong          0x001B0040      (screen)
+# skip {85CEE8D6-0F90-4492-B484-98E38862B28D}.2.ver0x0000000000000004.db {DDF571F2-BE98-426D-8288-1A9A39C3FDA2}.2.ver0x0000000000000001.db
+# inside c:\ProgramData\Microsoft\Windows\Caches according to TrID and DROID
+>>23	ubyte		=0xFF
+# skip DROID fmt-801-signature-id-1166.tap with invalid name \253\253\253\253\253\253\253\253\253\253
+# which looks like: "TF COPY II" "screen    " "\023\001TF" "  1943    "
+>>>4     string          <\177           Spectrum .TAP data "%-10.10s"
+#!:mime	application/octet-stream
+!:mime	application/x-spectrum-tap
+!:ext	tap
+>>>>3	byte		0	- BASIC program
+# autostart line; 0..9999 are valid; 32768 means "no auto-loading"
+>>>>>16	uleshort	x	\b, autostart line %u
+# program length; length of BASIC program
+>>>>>18	uleshort	x	\b, program length %u
+>>>>3	byte		1	- number array
+>>>>3	byte		2	- character array
+>>>>3	byte		3	- memory block
+# length of the following data 1B00h=6912 and start address 4000h=16384 in case of a SCREEN$ header
+>>>>>14 belong          0x001B0040      (screen)
+# unused 32768=8000h 
+>>>>>18	uleshort	!32768	\b, unused %u
+# zxlength; length of the following data after the header
+>>>>14	uleshort	x	\b, data length %u
+#>>14	uleshort	x	\b, data length %#x
+# checksum byte; simply all bytes (including flag byte) XORed 
+#>>>>20	ubyte		x	\b, checksum %#x
 
@@ -36,5 +63,82 @@
 # TZX tape images
+# Update:	Joerg Jenderek 2023 May
+# URL:		http://fileformats.archiveteam.org/wiki/TZX
+# Reference:	https://worldofspectrum.net/TZXformat.html
+#		http://mark0.net/download/triddefs_xml.7z/defs/t/tzx.trid.xml
+# Note:		called "ZX Spectrum Tape image" by TrID and "TZX Format" by DROID via PUID fmt/1000
 0      string          ZXTape!\x1a     Spectrum .TZX data
+#!:mime	application/octet-stream
+!:mime	application/x-spectrum-tzx
+# CDT is used for Amstrad tapes
+!:ext	tzx/cdt
 >8     byte            x               version %d
 >9     byte            x               \b.%d
+# ID of first block
+>10	ubyte		x		\b; ID %#x
+# turbo speed data block
+>10	ubyte		=0x11		(turbo)
+# length of PILOT tone (number of pulses) 
+>>21	uleshort	x		\b, %u pilot pulses
+# length of PILOT pulse
+>>11	uleshort	x		with %u tstates
+# length of SYNC first pulse
+>>13	uleshort	x		\b, %u and
+# length of SYNC second pulse
+>>15	uleshort	x		%u sync tstates
+# length of ZERO bit pulse
+>>17	uleshort	x		\b, %u zero tstates
+# length of ONE bit pulse
+>>19	uleshort	x		\b, %u one tstates
+# used bits in the last byte
+>>23	ubyte		x		\b, use %u bit
+# plural s
+>>23	ubyte		>1		\bs
+# pause after this block in milliseconds
+>>24	uleshort	x		\b, %u ms pause
+# BYTE[3]; length of data that follow
+>>26	ulelong&0x00FFffFF x		\b, %u data bytes 
+>10	ubyte		=0x20		(pause)
+# pause duration in milliseconds
+>>11	uleshort	x		%u ms
+# text description
+>10	ubyte		=0x30		(text)
+# length of the text description
+#>>11	ubyte		x		L=%u
+>>11	pstring		x		"%s"
+# archive text description in ASCII format
+>10	ubyte		=0x32		(archive info)
+# length of archive text
+>>11	uleshort	x		\b, %#x bytes
+# number of text strings
+>>13	ubyte		x		with %u (type) text parts
+# text type identification byte: 0~title 1~publisher 2~author 3~year 4~language 5~type 6~price 7~protection 8~origin ff~comment
+>>14	byte		<9		(%d)
+>>>14	byte		>-2
+# length of text string
+#>>>>15	ubyte		x		L=%u
+>>>>15	pstring		x		%s
+# 2nd possible text description
+>>>>>&0	byte		<9		(%d)
+>>>>>>&-1	byte	>-2
+>>>>>>>&0	pstring	x		%s
+# 3rd possible text description
+>>>>>>>>&0	byte	<9		(%d)
+>>>>>>>>>&-1	byte	>-2
+>>>>>>>>>>&0	pstring	x		%s
+# 4th possible text description
+>>>>>>>>>>>&0	byte	<9		(%d)
+>>>>>>>>>>>>&-1	byte	>-2
+>>>>>>>>>>>>>&0	pstring	x		%s
+# 5th possible text description
+>>>>>>>>>>>>>>&0	byte	<9	(%d)
+>>>>>>>>>>>>>>>&-1	byte	>-2
+>>>>>>>>>>>>>>>>&0	pstring	x	%s
+# 6th possible text description
+>>>>>>>>>>>>>>>>>&0	byte	<9	(%d)
+>>>>>>>>>>>>>>>>>>&-1	byte	>-2
+>>>>>>>>>>>>>>>>>>>&0	pstring	x	%s
+# 7th possible text description
+>>>>>>>>>>>>>>>>>>>>&0	byte	<9	(%d)
+>>>>>>>>>>>>>>>>>>>>>&-1 byte	>-2
+>>>>>>>>>>>>>>>>>>>>>>&0 pstring x	%s
 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.44-spectrum-tape.diff.sig
Type: application/octet-stream
Size: 2133 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230504/b50dd535/attachment.obj>


More information about the File mailing list