[File] [PATCH] of Magdir/spectrum Windows cache *.db misidetified as Spectrum .TAP

Christos Zoulas christos at zoulas.com
Mon May 8 01:33:49 UTC 2023


Committed, thanks!

christos

> On May 4, 2023, at 4:58 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hello,
> some days ago i handles some database. Often the suffix db is used
> for such file names. Some samples are misidentified as "Spectrum .TAP
> data".
> 
> When running file command version 5.44 on such real spectrum tape
> examples and misidentified db samples, i get an output like:
> 
> 1943 (-).TAP:
> 	Spectrum .TAP data "  1943    " - BASIC program
> Cauldron II (S).cdt:
> 	Spectrum .TZX data version 1.10
> Count Duckula (E).cdt:
> 	Spectrum .TZX data version 1.10
> EXAMPLES.TAP:
> 	Spectrum .TAP data "screen    " - memory block (screen)
> TFCOPY2.TAP:
> 	Spectrum .TAP data "TF COPY II" - BASIC program
> Tape-FileCopy(MartinMoracek)(SuperII).tap:
> 	Spectrum .TAP data "\023\001TF" - BASIC program
> Treachery.tzx:
> 	Spectrum .TZX data version 1.13
> fmt-801-signature-id-1166.tap:
> 	data
> {85CEE8D6-0F90-4492-B484-98E38862B28D}.2.ver0x0000000000000004.db:
> 	Spectrum .TAP data ")\335\242\" - BASIC program
> {DDF571F2-BE98-426D-8288-1A9A39C3FDA2}.2.ver0x0000000000000004.db:
> 	Spectrum .TAP data ")\335\242\" - BASIC program
> 
> With --extension option only ??? is displayed. Furthermore with
> - -i option for samples only generic application/octet-stream is shown.
> 
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/).
> The TAP samples are described as "TAP (ZX Spectrum)" without mime by
> PUID fmt/801. The other tape samples are described as "TZX Format"
> without mime type by PUID fmt/1000. The TZX suffix is considered as
> valid whereas the CDT suffix is considered as bad (EXTENSION_MISMATCH
> true). The DB samples are described as "Thumbs DB file" by PUID
> fmt/682	via extension (See appended droid-tape.csv.gz).
> 
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). The TAP samples are
> described as here as "ZX Spectrum Tape image" by tap-zx.trid.xml.
> Also the DROID sample fmt-801-signature-id-1166.tap is described in
> that way. The other tape samples are described as "ZX Spectrum Tape
> image" with mime type application/x-spectrum-tzx by tzx.trid.xml. The
> DB samples are here described as "Unknown!" (See appended
> output/trid-v-tape.txt.gz).
> 
> TrID list the used file name extension and often with -v option the
> related URL pointing to used file format information. With the help
> of other tools i found a page about newer TZX tape on file formats
> archive team web site. This is now expressed inside Magdir/spectrum
> by additional comment lines like:
> # URL:		http://fileformats.archiveteam.org/wiki/TZX
> # Reference:	https://worldofspectrum.net/TZXformat.html
> #		http://mark0.net/download/triddefs_xml.7z
> #		defs/t/tzx.trid.xml
> 
> The description happens inside Magdir/spectrum by lines like
> 0      string          ZXTape!\x1a     Spectrum .TZX data
>> 8     byte            x               version %d
>> 9     byte            x               \b.%d
> Instead of generic application/octet-stream mime type i show type
> used by TrID. The standard file name suffix is TZX. CDT suffix is
> used for Amstrad tapes, which have the same format. So this
> information is shown by adapted lines. This now looks like:
> 0      string          ZXTape!\x1a     Spectrum .TZX data
> !:mime	application/x-spectrum-tzx
> !:ext	tzx/cdt
>> 8     byte            x               version %d
>> 9     byte            x               \b.%d
> 
> When you are inspecting hundreds of such tape samples you are happy
> when you get additional information to distinguish the samples.
> After the start the ID of first block is block is stored. So show
> this value and for a few cases also in human readable form (like
> pause text etc.). So for different ID show then this additional
> information. For pause id (0x20) this is the duration in
> milliseconds. For text id (0x30) this is a pascal string. So
> according to documentation this additional information is shown by
> additional lines like:
>> 10	ubyte		x		\b; ID %#x
>> 10	ubyte		=0x20		(pause)
>>> 11	uleshort	x		%u ms
>> 10	ubyte		=0x30		(text)
>>> 11	pstring		x		"%s"
> That information be verified by fuse-emulator-utils via command
> line like:
> 	tzxlist EXAMPLES.TAP
> 
> With the help of other tools i found a page about the older TAP (ZX
> Spectrum) tape on file formats archive team web site. This is now
> expressed inside Magdir/spectrum by additional comment lines like:
> # URL:		http://fileformats.archiveteam.org/wiki/
> #		TAP_(ZX_Spectrum)
> # Reference:	http://web.archive.org/web/20110711141601/
> #		http://www.zxmodules.de/fileformats/tapformat.html
> #		http://mark0.net/download/triddefs_xml.7z
> #		defs/t/tap-zx.trid.xml
> 
> The description start inside Magdir/spectrum with lines like:
> 0       string          \023\000\000
>> 4      string          >\0
>>> 4     string          <\177          Spectrum .TAP data "%-10.10s"
> The first test look for starting 3 byte "magic" like the other tools.
> By the other test lines sanity-check of string are done to check if
> name is printable. This must be done carefully, because names are
> not always "nice" ASCII like "TF COPY II", "screen    ", "  1943    "
> or "\023\001TF" in Martin Moracek example. So i could not use
> stricter checks here, but by third test line DROID sample
> fmt-801-signature-id-1166.tap of DROID with invalid name
> \253\253\253\253\253\253\253\253\253\253 is skipped. These tests
> are only used by file command tool. So i look what the other tool
> are using. These check value of byte at offset 23. I do not
> understand why this works and if this is always true, but both
> other tools use this method and in the end i found not other way.
> So i apply also this method and so the Windows Caches db samples (
> found inside
> c:\ProgramData\Microsoft\Windows\Caches) are skipped. Analogue to
> newer tzx format i choose a similar user defined mime type. So the
> start now looks like:
> 0       string          \023\000\000
>> 4      string          >\0
>>> 23	ubyte		=0xFF
>>>> 4     string          <\177     Spectrum .TAP data "%-10.10s"
> !:mime	application/x-spectrum-tap
> !:ext	tap
> 
> There exist lines to do sub classification depending on data type
> byte. That looked like
>>>> 3    byte            0               - BASIC program
>>>> 3    byte            1               - number array
>>>> 3    byte            2               - character array
>>>> 3    byte            3               - memory block
>>>>> 14  belong          0x001B0040      (screen)
> For memory block in case of a SCREEN$ header the length of the
> following data is 1B00h=6912 and start address is 4000h=16384. That
> was shown by last line. At the end i also show length of the
> following data after the header and checksum byte (simply all bytes
> including flag byte XORed). This is done by additional lines like:
>>>>> 14		uleshort	x	\b, data length %u
> #>>>>20	ubyte		x	\b, checksum %#x
> With this information i tried to inspect the next block, but i get
> values that are not reasonable for me. So i could not use these
> facts as additional test and in the end i use method of the other
> tools. The misidentified db samples are described as BASIC program.
> So i add lines to show more information at this point for this
> case. So i show
> auto start line. According to documentarian values 0 until 9999
> are valid and value 32768 means "no auto-loading". I also show
> length of BASIC program. So that branch now becomes like:
>>>>> 3	byte		0	- BASIC program
>>>>>> 16	uleshort	x	\b, autostart line %u
>>>>>> 18	uleshort	x	\b, program length %u
> 
> After applying the above mentioned modifications by patch
> file-5.44-spectrum-tape.diff then misidentification vanish and i
> get a more detailed output like:
> 1943 (-).TAP:
> 	Spectrum .TAP data "  1943    " - BASIC program
> 	, autostart line 1, program length 335
> 	, data length 335
> Cauldron II (S).cdt:
> 	Spectrum .TZX data version 1.10
> 	; ID 0x32 (archive info)
> 	, 0x9e bytes with 7 (type) text parts
> 	(0) CAULDRON II (1) PALACE SOFTWARE / ERBE
> 	(3) 1986 (4) SPANISH! (5) GAME
> 	(8) ORIGINAL TAPE SPANISH VERSION
> 	(-1) D.L. M-21936-1986. TZXed by johnny farragut
> 	(deepfb2002 at yahoo.es)
> Count Duckula (E).cdt:
> 	Spectrum .TZX data version 1.10
> 	; ID 0x11 (turbo)
> 	, 4096 pilot pulses with 2337 tstates
> 	, 1575 and 1103 sync tstates
> 	, 1195 zero tstates, 2388 one tstates
> 	, use 1 bit, 15 ms pause, 264 data bytes
> EXAMPLES.TAP:
> 	Spectrum .TAP data "screen    " - memory block (screen)
> 	, data length 6912
> TFCOPY2.TAP:
> 	Spectrum .TAP data "TF COPY II" - BASIC program
> 	, autostart line 10, program length 1505
> 	, data length 1505
> Tape-FileCopy(MartinMoracek)(SuperII).tap:
> 	Spectrum .TAP data "\023\001TF" - BASIC program
> 	, autostart line 1, program length 4319
> 	, data length 4401
> Treachery.tzx:
> 	Spectrum .TZX data version 1.13
> 	; ID 0x30 (text)
> 	"Created by Spectaculator"
> fmt-801-signature-id-1166.tap:
> 	data
> {85CEE8D6-0F90-4492-B484-98E38862B28D}.2.ver0x0000000000000004.db:
> 	data
> {DDF571F2-BE98-426D-8288-1A9A39C3FDA2}.2.ver0x0000000000000004.db:
> 	data
> 
> I hope my diff file can be applied in future version of
> file utility.
> 
> There is something to do. Classify the mysterious Windows cache db
> samples.
> 
> With best wishes
> Jörg Jenderek
> - --
> Jörg Jenderek
> 
> 
> 
> 
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
> 
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCZFQcYAAKCRCv8rHJQhrU
> 1n2dAKCrjzA/LU168uuQ65E4wc+toXKJUACffASl0tXWsjz4qKAQEnQ7P37OJp8=
> =JuRv
> -----END PGP SIGNATURE-----
> <droid-tape.csv.gz><trid-v-tape.txt.gz><file-5_44-spectrum-tape_diff.DEFANGED-4><file-5_44-spectrum-tape_diff_sig.DEFANGED-5>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20230507/ed1851cc/attachment.asc>


More information about the File mailing list