[File] [PATCH] of Magdir/spectrum Windows cache *.db misidetified as Spectrum .TAP
Christos Zoulas
christos at zoulas.com
Mon May 8 01:33:49 UTC 2023
Committed, thanks!
christos
> On May 4, 2023, at 4:58 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello,
> some days ago i handles some database. Often the suffix db is used
> for such file names. Some samples are misidentified as "Spectrum .TAP
> data".
>
> When running file command version 5.44 on such real spectrum tape
> examples and misidentified db samples, i get an output like:
>
> 1943 (-).TAP:
> Spectrum .TAP data " 1943 " - BASIC program
> Cauldron II (S).cdt:
> Spectrum .TZX data version 1.10
> Count Duckula (E).cdt:
> Spectrum .TZX data version 1.10
> EXAMPLES.TAP:
> Spectrum .TAP data "screen " - memory block (screen)
> TFCOPY2.TAP:
> Spectrum .TAP data "TF COPY II" - BASIC program
> Tape-FileCopy(MartinMoracek)(SuperII).tap:
> Spectrum .TAP data "\023\001TF" - BASIC program
> Treachery.tzx:
> Spectrum .TZX data version 1.13
> fmt-801-signature-id-1166.tap:
> data
> {85CEE8D6-0F90-4492-B484-98E38862B28D}.2.ver0x0000000000000004.db:
> Spectrum .TAP data ")\335\242\" - BASIC program
> {DDF571F2-BE98-426D-8288-1A9A39C3FDA2}.2.ver0x0000000000000004.db:
> Spectrum .TAP data ")\335\242\" - BASIC program
>
> With --extension option only ??? is displayed. Furthermore with
> - -i option for samples only generic application/octet-stream is shown.
>
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/).
> The TAP samples are described as "TAP (ZX Spectrum)" without mime by
> PUID fmt/801. The other tape samples are described as "TZX Format"
> without mime type by PUID fmt/1000. The TZX suffix is considered as
> valid whereas the CDT suffix is considered as bad (EXTENSION_MISMATCH
> true). The DB samples are described as "Thumbs DB file" by PUID
> fmt/682 via extension (See appended droid-tape.csv.gz).
>
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). The TAP samples are
> described as here as "ZX Spectrum Tape image" by tap-zx.trid.xml.
> Also the DROID sample fmt-801-signature-id-1166.tap is described in
> that way. The other tape samples are described as "ZX Spectrum Tape
> image" with mime type application/x-spectrum-tzx by tzx.trid.xml. The
> DB samples are here described as "Unknown!" (See appended
> output/trid-v-tape.txt.gz).
>
> TrID list the used file name extension and often with -v option the
> related URL pointing to used file format information. With the help
> of other tools i found a page about newer TZX tape on file formats
> archive team web site. This is now expressed inside Magdir/spectrum
> by additional comment lines like:
> # URL: http://fileformats.archiveteam.org/wiki/TZX
> # Reference: https://worldofspectrum.net/TZXformat.html
> # http://mark0.net/download/triddefs_xml.7z
> # defs/t/tzx.trid.xml
>
> The description happens inside Magdir/spectrum by lines like
> 0 string ZXTape!\x1a Spectrum .TZX data
>> 8 byte x version %d
>> 9 byte x \b.%d
> Instead of generic application/octet-stream mime type i show type
> used by TrID. The standard file name suffix is TZX. CDT suffix is
> used for Amstrad tapes, which have the same format. So this
> information is shown by adapted lines. This now looks like:
> 0 string ZXTape!\x1a Spectrum .TZX data
> !:mime application/x-spectrum-tzx
> !:ext tzx/cdt
>> 8 byte x version %d
>> 9 byte x \b.%d
>
> When you are inspecting hundreds of such tape samples you are happy
> when you get additional information to distinguish the samples.
> After the start the ID of first block is block is stored. So show
> this value and for a few cases also in human readable form (like
> pause text etc.). So for different ID show then this additional
> information. For pause id (0x20) this is the duration in
> milliseconds. For text id (0x30) this is a pascal string. So
> according to documentation this additional information is shown by
> additional lines like:
>> 10 ubyte x \b; ID %#x
>> 10 ubyte =0x20 (pause)
>>> 11 uleshort x %u ms
>> 10 ubyte =0x30 (text)
>>> 11 pstring x "%s"
> That information be verified by fuse-emulator-utils via command
> line like:
> tzxlist EXAMPLES.TAP
>
> With the help of other tools i found a page about the older TAP (ZX
> Spectrum) tape on file formats archive team web site. This is now
> expressed inside Magdir/spectrum by additional comment lines like:
> # URL: http://fileformats.archiveteam.org/wiki/
> # TAP_(ZX_Spectrum)
> # Reference: http://web.archive.org/web/20110711141601/
> # http://www.zxmodules.de/fileformats/tapformat.html
> # http://mark0.net/download/triddefs_xml.7z
> # defs/t/tap-zx.trid.xml
>
> The description start inside Magdir/spectrum with lines like:
> 0 string \023\000\000
>> 4 string >\0
>>> 4 string <\177 Spectrum .TAP data "%-10.10s"
> The first test look for starting 3 byte "magic" like the other tools.
> By the other test lines sanity-check of string are done to check if
> name is printable. This must be done carefully, because names are
> not always "nice" ASCII like "TF COPY II", "screen ", " 1943 "
> or "\023\001TF" in Martin Moracek example. So i could not use
> stricter checks here, but by third test line DROID sample
> fmt-801-signature-id-1166.tap of DROID with invalid name
> \253\253\253\253\253\253\253\253\253\253 is skipped. These tests
> are only used by file command tool. So i look what the other tool
> are using. These check value of byte at offset 23. I do not
> understand why this works and if this is always true, but both
> other tools use this method and in the end i found not other way.
> So i apply also this method and so the Windows Caches db samples (
> found inside
> c:\ProgramData\Microsoft\Windows\Caches) are skipped. Analogue to
> newer tzx format i choose a similar user defined mime type. So the
> start now looks like:
> 0 string \023\000\000
>> 4 string >\0
>>> 23 ubyte =0xFF
>>>> 4 string <\177 Spectrum .TAP data "%-10.10s"
> !:mime application/x-spectrum-tap
> !:ext tap
>
> There exist lines to do sub classification depending on data type
> byte. That looked like
>>>> 3 byte 0 - BASIC program
>>>> 3 byte 1 - number array
>>>> 3 byte 2 - character array
>>>> 3 byte 3 - memory block
>>>>> 14 belong 0x001B0040 (screen)
> For memory block in case of a SCREEN$ header the length of the
> following data is 1B00h=6912 and start address is 4000h=16384. That
> was shown by last line. At the end i also show length of the
> following data after the header and checksum byte (simply all bytes
> including flag byte XORed). This is done by additional lines like:
>>>>> 14 uleshort x \b, data length %u
> #>>>>20 ubyte x \b, checksum %#x
> With this information i tried to inspect the next block, but i get
> values that are not reasonable for me. So i could not use these
> facts as additional test and in the end i use method of the other
> tools. The misidentified db samples are described as BASIC program.
> So i add lines to show more information at this point for this
> case. So i show
> auto start line. According to documentarian values 0 until 9999
> are valid and value 32768 means "no auto-loading". I also show
> length of BASIC program. So that branch now becomes like:
>>>>> 3 byte 0 - BASIC program
>>>>>> 16 uleshort x \b, autostart line %u
>>>>>> 18 uleshort x \b, program length %u
>
> After applying the above mentioned modifications by patch
> file-5.44-spectrum-tape.diff then misidentification vanish and i
> get a more detailed output like:
> 1943 (-).TAP:
> Spectrum .TAP data " 1943 " - BASIC program
> , autostart line 1, program length 335
> , data length 335
> Cauldron II (S).cdt:
> Spectrum .TZX data version 1.10
> ; ID 0x32 (archive info)
> , 0x9e bytes with 7 (type) text parts
> (0) CAULDRON II (1) PALACE SOFTWARE / ERBE
> (3) 1986 (4) SPANISH! (5) GAME
> (8) ORIGINAL TAPE SPANISH VERSION
> (-1) D.L. M-21936-1986. TZXed by johnny farragut
> (deepfb2002 at yahoo.es)
> Count Duckula (E).cdt:
> Spectrum .TZX data version 1.10
> ; ID 0x11 (turbo)
> , 4096 pilot pulses with 2337 tstates
> , 1575 and 1103 sync tstates
> , 1195 zero tstates, 2388 one tstates
> , use 1 bit, 15 ms pause, 264 data bytes
> EXAMPLES.TAP:
> Spectrum .TAP data "screen " - memory block (screen)
> , data length 6912
> TFCOPY2.TAP:
> Spectrum .TAP data "TF COPY II" - BASIC program
> , autostart line 10, program length 1505
> , data length 1505
> Tape-FileCopy(MartinMoracek)(SuperII).tap:
> Spectrum .TAP data "\023\001TF" - BASIC program
> , autostart line 1, program length 4319
> , data length 4401
> Treachery.tzx:
> Spectrum .TZX data version 1.13
> ; ID 0x30 (text)
> "Created by Spectaculator"
> fmt-801-signature-id-1166.tap:
> data
> {85CEE8D6-0F90-4492-B484-98E38862B28D}.2.ver0x0000000000000004.db:
> data
> {DDF571F2-BE98-426D-8288-1A9A39C3FDA2}.2.ver0x0000000000000004.db:
> data
>
> I hope my diff file can be applied in future version of
> file utility.
>
> There is something to do. Classify the mysterious Windows cache db
> samples.
>
> With best wishes
> Jörg Jenderek
> - --
> Jörg Jenderek
>
>
>
>
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCZFQcYAAKCRCv8rHJQhrU
> 1n2dAKCrjzA/LU168uuQ65E4wc+toXKJUACffASl0tXWsjz4qKAQEnQ7P37OJp8=
> =JuRv
> -----END PGP SIGNATURE-----
> <droid-tape.csv.gz><trid-v-tape.txt.gz><file-5_44-spectrum-tape_diff.DEFANGED-4><file-5_44-spectrum-tape_diff_sig.DEFANGED-5>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20230507/ed1851cc/attachment.asc>
More information about the File
mailing list