[File] [PATCH] Magdir/database dBase III DBT misidentifies some Atari DEGAS bitmaps, SQLite Write-Ahead Log shared memory
Christos Zoulas
christos at zoulas.com
Thu Jan 12 00:14:15 UTC 2023
Committed, thanks!
christos
> On Jan 8, 2023, at 5:40 AM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello,
>
> Some days ago i send patch for SQLite Write-Ahead Log shared memory
> files.
>
> When running file command version 5.44 with -k option on such
> examples and more misidentified samples i get an output like:
>
> Deep_Strike.aas: dBase III DBT, version number 0
> , next free block index 8192
> , 1st item "\374\374"
> ELEPHANT.PC3: Atari DEGAS Elite
> compressed bitmap 640 x 400 x 2
> , color palette
> 0000 0777 0000 0000 0000 ...
> dBase III DBT, version number 0
> , next free block index 640
> , 1st item "\351\377\003\376"
> ST.PC2: Atari DEGAS Elite
> compressed bitmap 640 x 200 x 4
> , color palette
> 0000 0777 0000 0000 0000 ...
> dBase III DBT, version number 0
> , next free block index 384
> , 1st item "\341\377\261"
> StateRepository-Deployment.srd-shm: dBase III DBT
> , next free block index 3007000
> , block length 6144
> dbase-memo.dbt: dBase III DBT
> , next free block index 2
> , 1st item "1st memo \032"
> dbase3dbt0_1.dbt: dBase III DBT, version number 0
> , next free block index 2
> 1st item "1st memo. test umlaut
> with cp 1252:
> \344=ae, \366=oe, \374=ue,
> \337=ss,\200=euro,
> \304=Ae, \326=Oe,
> \334=Ue\032\032"
> dbase_83.dbt: dBase III DBT, version number 0
> , next free block index 79
> , 1st item "Our Original
> assortment...a little taste
> dragon's_lair_ii.aas: dBase III DBT, version number 0
> , next free block index 8192
> , 1st item "\314\303\003\003
> fsadress.dbt: dBase III DBT, version number 0
> , next free block index 5
> , 1st item "This is a note for
> Karl M\374ller. "
> gcry_cast5.mod: dBase III DBT
> , next free block index 4
> , 1st item "\001\010"
> keylayouts.mod: dBase III DBT
> , next free block index 24
> , 1st item "rintf"
> nativedisk.mod: dBase III DBT
> , next free block index 10
> , 1st item
> "rub_file_get_device_name"
> part_sun.mod: dBase III DBT, version number 0
> , next free block index 100
> , 1st item "LICENSE=GPLv3+"
> pcidump.mod: dBase III DBT, version number 0
> , next free block index 1
> , 1st item "\203x\034"
> plan9.mod: dBase III DBT
> , next free block index 4
> , 1st item "b_strcmp"
> test.dbt: dBase III DBT
> , next free block index 16
> , 1st item "WHAT IS XBASE"
> virtual-boy-wario-land.vb: dBase III DBT, version number 0
> , next free block index 61440
> , 1st item " \307\356\377\004"
>
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). Correct here it
> only identifies also the Atari DEGAS bitmap (*.PC2 *.PC3 See
> appended trid-v-dbt.txt.gz).
>
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/). This
> identifies all DBT samples as "dBASE Text Memo" based on file name
> suffix by PUID x-fmt/311.
>
> Luckily in Magdir/database the displaying part is done by sub routine
> dbase3-memo-print. So only additional tests must be done before
> calling that routine.
>
> For one DBT branch the output does not contain the phrase "version
> number". That is for samples with standard version number 3. After
> skipping many RAR by test for valid block sizes and then checking for
> version number 3 the sub routine is called. This looks at the moment
> like:
>>>>>> 20 ubelong&0xFF01209B 0x00000000
>>>>>>> 16 ubyte 3
>>>>>>>> 0 use dbase3-memo-print
> For real DBT sample like dbase3dbt0_1.dbt the first item must be
> something like printable ASCII string like "1st memo. test umlaut".
> This is done in sub routine by line like:
>> 512 string >\0 \b, 1st item "%s"
> In other branches i already skipped "bad" samples by checking first
> item field. So i do here the same. So i skip samples with invalid
> "low" 1st item field like "\0\0\0\0" in
> StateRepository-Deployment.srd-shm and "\001\010\0\0" in
> gcry_cast5.mod by additional test line
>>>>>>>> 512 ubyte >040
>
> Unfortunately this also true for samples like keylayouts.mod,
> nativedisk.mod and plan9.mod.
> So i must look for more tests. When looking in DBT examples we see in
> many examples at the end a byte sequence is displayed like
> \032\032. So for debugging reason i activate at the end of
> subroutine some lines like:
>> 513 search/0x225 \032 FOUND_TERMINATOR
>>> &0 ubyte 032 2xCTRL_Z
>>> &0 ubyte 0 1xCTRL_Z
> Then we see what is written in the documentation. The item field
> normally is terminated by 2 Control-Z characters. But in some
> variants (FoxPro, Fox?? like fsadress.dbt) only one Control-Z
> character is used. In my inspected example the next character was a
> nil byte. Nothing is written about the size of memo field. But using
> brain we can assume that this only some hundred characters. No human
> will write a comment or note with thousand of characters. So use this
> facts in concerned branch. So the second test part in this branch now
> becomes like:
>>>>>>>> 513 search/3308 \032
> By this test GRUB module keylayouts.mod is skipped.
> Unfortunately at this point work is not done, because there exist
> also DBT samples like dbase-memo.dbt where second terminating
> character a nil byte. At this point this also true for old GRUB
> module nativedisk.mod. There the first ASCII like phrase is
> grub_mod_init at offset 429 (=1ADh). So skip GRUB module explicitly
> by checking for that specific word. So last test part in that branch
> now becomes like:
>>>>>>>>>> &0 ubyte 0
>>>>>>>>>>> 0x1ad string !grub_mod_init
>>>>>>>>>>>> 0 use dbase3-memo-print
>
> In another branch concerning version number 0 a few DOS executables
> CPQ0TD.DRV E30ODI.COM IBM0MONO.DRV are skipped because first item was
> "too low". Then a few Microsoft Event Trace Logs (DlTel-Merge.etl
> boot_BASE+CSWITCH_1.etl UpdateUx.006.etl) are skipped because of
> invalid "high" 1st item and then the sub routine is called. So at the
> moment this looks like:
>>>>>>>>>>>> 512 ubyte >037
>>>>>>>>>>>>> 512 ubyte <0377
>>>>>>>>>>>>>> 0 use dbase3-memo-print
>
> Unfortunately at this position this also true for some Commodore 64
> Art Studio (Deep_Strike.aas dragon's_lair_ii.aas), some Atari DEGAS
> Elite bitmap (ELEPHANT.PC3 ST.PC2), some probably old GRUB modules
> (part_sun.mod pcidump.mod) and virtual-boy-wario-land.vb.
> Unfortunately i can not be more restrictive in the test for "invalid
> high" because German umlaut ue is encoded as octal 374 "high".
>
> So like in other branch i look for memo field terminating Control-Z
> character. This is done by line like:
>>>>>>>>>>>>>> 513 search/523 \032
> Now most misidentified samples vanished. But there i found one
> exception. For old GRUB module pcidump.mod this is also true. So i
> tried to look for second terminating character. For most real DBT
> samples this Ctrl-Z again ( like in dbase3dbt0_1.dbt dbase_83.dbt),
> where as for GRUB module this was a nil byte. So these are matched
> by lines like:
>>>>>>>>>>>>>>> &0 ubyte 032
>>>>>>>>>>>>>>>> 0 use dbase3-memo-print
> This is very unlikely but can also nil byte as second terminating
> character can occur for real DBT samples (like fsadress.dbt
> umlaut-dbf-cmd.dbt). In the GRUB module the first specific ASCII
> like phrase at offset 780 (=30Ch) is:
> pcidump\0Show raw dump of the PCI configuration space
> Unfortunately i can not skip this module by testing for unequal this
> phrase, because than short DBT samples are missed. So i search for
> parts of this sentence and if this fails i get DBT samples by
> default clause. So the last additional test before calling sub routin
> e
> for samples (like fsadress.dbt umlaut-dbf-cmd.dbt) now becomes like:
>>>>>>>>>>>>>>> &0 ubyte 0
>>>>>>>>>>>>>>>> 514 search/0x11E pcidump\0Show
>>>>>>>>>>>>>>>> 514 default x
>>>>>>>>>>>>>>>>> 0 use dbase3-memo-print
>
> After applying the above mentioned modifications by patch
> file-5.44-database-dbt.diff then all concerned samples are not
> misidentified as DBT any more and real DBT samples are still
> recognized. This now looks like:
>
> Deep_Strike.aas: data
> ELEPHANT.PC3: Atari DEGAS Elite
> compressed bitmap 640 x 400 x 2
> , color palette
> 0000 0777 0000 0000 0000 ...
> ST.PC2: Atari DEGAS Elite
> compressed bitmap 640 x 200 x 4
> , color palette
> 0000 0777 0000 0000 0000 ...
> StateRepository-Deployment.srd-shm: data
> dbase-memo.dbt: dBase III DBT
> , next free block index 2
> , 1st item "1st memo \032"
> dbase3dbt0_1.dbt: dBase III DBT, version number 0
> , next free block index 2
> , 1st item "1st memo. test umlaut
> with cp 1252:
> \344=ae, \366=oe, \374=ue,
> \337=ss,\200=euro,
> \304=Ae, \326=Oe,
> \334=Ue\032\032"
> dbase_83.dbt: dBase III DBT, version number 0
> , next free block index 79
> , 1st item "Our Original
> assortment...a little taste
> dragon's_lair_ii.aas: data
> fsadress.dbt: dBase III DBT, version number 0
> , next free block index 5
> , 1st item "This is a note for
> Karl M\374ller. "
> gcry_cast5.mod: data
> keylayouts.mod: data
> nativedisk.mod: data
> part_sun.mod: data
> pcidump.mod: data
> plan9.mod: data
> test.dbt: dBase III DBT
> , next free block index 16
> , 1st item "WHAT IS XBASE"
> virtual-boy-wario-land.vb: data
>
> I hope my diff file can be applied in future version of file
> utility.
>
>
> With best wishes,
> Jörg Jenderek
> - --
> Jörg Jenderek
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCY7qdjgAKCRCv8rHJQhrU
> 1qG7AJ9zATXBUFAFYQ84kUdwQWBInNmCOACfYQj0i6n8pUpqSfqkB6qZT5d4tCw=
> =hc/n
> -----END PGP SIGNATURE-----
> <trid-v-dbt.txt.gz><file-5_44-database-dbt_diff.DEFANGED-624548><file-5_44-database-dbt_diff_sig.DEFANGED-624549>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20230111/c2a9dbc4/attachment.asc>
More information about the File
mailing list