[File] [PATCH] Magdir/database dBase III DBT misidentifies some Atari DEGAS bitmaps, SQLite Write-Ahead Log shared memory

Christos Zoulas christos at zoulas.com
Thu Jan 12 00:14:15 UTC 2023


Committed, thanks!

christos

> On Jan 8, 2023, at 5:40 AM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hello,
> 
> Some days ago i send patch for SQLite Write-Ahead Log shared memory
> files.
> 
> When running file command version 5.44 with -k option on such
> examples and more misidentified samples i get an output like:
> 
> Deep_Strike.aas:                    dBase III DBT, version number 0
> 				    , next free block index 8192
> 				    , 1st item "\374\374"
> ELEPHANT.PC3:                       Atari DEGAS Elite
> 				    compressed bitmap 640 x 400 x 2
> 				    , color palette
> 				    0000 0777 0000 0000 0000 ...
> 				    dBase III DBT, version number 0
> 				    , next free block index 640
> 				    , 1st item "\351\377\003\376"
> ST.PC2:                             Atari DEGAS Elite
> 				    compressed bitmap 640 x 200 x 4
> 				    , color palette
> 				    0000 0777 0000 0000 0000 ...
> 				    dBase III DBT, version number 0
> 				    , next free block index 384
> 				    , 1st item "\341\377\261"
> StateRepository-Deployment.srd-shm: dBase III DBT
> 				    , next free block index 3007000
> 				    , block length 6144
> dbase-memo.dbt:                     dBase III DBT
> 				    , next free block index 2
> 				    , 1st item "1st memo \032"
> dbase3dbt0_1.dbt:                   dBase III DBT, version number 0
> 				    , next free block index 2
> 				    1st item "1st memo. test umlaut
> 				    with cp 1252:
> 				    \344=ae, \366=oe, \374=ue,
> 				    \337=ss,\200=euro,
> 				    \304=Ae, \326=Oe,
> 				    \334=Ue\032\032"
> dbase_83.dbt:                       dBase III DBT, version number 0
> 				    , next free block index 79
> 				    , 1st item "Our Original
> 				    assortment...a little taste
> dragon's_lair_ii.aas:               dBase III DBT, version number 0
> 				    , next free block index 8192
> 				    , 1st item "\314\303\003\003
> fsadress.dbt:                       dBase III DBT, version number 0
> 				    , next free block index 5
> 				    , 1st item "This is a note for
> 				    Karl M\374ller. "
> gcry_cast5.mod:                     dBase III DBT
> 				    , next free block index 4
> 				    , 1st item "\001\010"
> keylayouts.mod:                     dBase III DBT
> 				    , next free block index 24
> 				    , 1st item "rintf"
> nativedisk.mod:                     dBase III DBT
> 				    , next free block index 10
> 				    , 1st item
> 				    "rub_file_get_device_name"
> part_sun.mod:                       dBase III DBT, version number 0
> 				    , next free block index 100
> 				    , 1st item "LICENSE=GPLv3+"
> pcidump.mod:                        dBase III DBT, version number 0
> 				    , next free block index 1
> 				    , 1st item "\203x\034"
> plan9.mod:                          dBase III DBT
> 				    , next free block index 4
> 				    , 1st item "b_strcmp"
> test.dbt:                           dBase III DBT
> 				    , next free block index 16
> 				    , 1st item "WHAT IS XBASE"
> virtual-boy-wario-land.vb:          dBase III DBT, version number 0
> 				    , next free block index 61440
> 				    , 1st item " \307\356\377\004"
> 
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). Correct here it
> only identifies also the Atari DEGAS bitmap (*.PC2 *.PC3 See
> appended trid-v-dbt.txt.gz).
> 
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/). This
> identifies all DBT samples as "dBASE Text Memo" based on file name
> suffix by PUID x-fmt/311.
> 
> Luckily in Magdir/database the displaying part is done by sub routine
> dbase3-memo-print. So only additional tests must be done before
> calling that routine.
> 
> For one DBT branch the output does not contain the phrase "version
> number". That is for samples with standard version number 3. After
> skipping many RAR by test for valid block sizes and then checking for
> version number 3 the sub routine is called. This looks at the moment
> like:
>>>>>> 20	ubelong&0xFF01209B	0x00000000
>>>>>>> 16	ubyte		3
>>>>>>>> 0	use		dbase3-memo-print
> For real DBT sample like dbase3dbt0_1.dbt the first item must be
> something like printable ASCII string like "1st memo. test umlaut".
> This is done in sub routine by line like:
>> 512	string			>\0		\b, 1st item "%s"
> In other branches i already skipped "bad" samples by checking first
> item field. So i do here the same. So i skip  samples with invalid
> "low" 1st item field like "\0\0\0\0" in
> StateRepository-Deployment.srd-shm and "\001\010\0\0" in
> gcry_cast5.mod by additional test line
>>>>>>>> 512	ubyte		>040
> 
> Unfortunately this also true for samples like keylayouts.mod,
> nativedisk.mod and plan9.mod.
> So i must look for more tests. When looking in DBT examples we see in
> many examples at the end a byte sequence is displayed like
> \032\032. So for debugging reason i activate at the end of
> subroutine some lines like:
>> 513	search/0x225		\032		FOUND_TERMINATOR
>>> &0	ubyte			032		2xCTRL_Z
>>> &0	ubyte			0		1xCTRL_Z
> Then we see what is written in the documentation. The item field
> normally is terminated by 2 Control-Z characters. But in some
> variants (FoxPro, Fox?? like fsadress.dbt) only one Control-Z
> character is used. In my inspected example the next character was a
> nil byte. Nothing is written about the size of memo field. But using
> brain we can assume that this only some hundred characters. No human
> will write a comment or note with thousand of characters. So use this
> facts in concerned branch. So the second test part in this branch now
> becomes like:
>>>>>>>> 513 search/3308	\032
> By this test GRUB module keylayouts.mod is skipped.
> Unfortunately at this point work is not done, because there exist
> also DBT samples like dbase-memo.dbt where second terminating
> character a nil byte. At this point this also true for old GRUB
> module nativedisk.mod. There the first ASCII like phrase is
> grub_mod_init at offset 429 (=1ADh). So skip GRUB module explicitly
> by checking for that specific word. So last test part in that branch
> now becomes like:
>>>>>>>>>> &0 ubyte		0
>>>>>>>>>>> 0x1ad string		!grub_mod_init
>>>>>>>>>>>> 0	use		dbase3-memo-print
> 
> In another branch concerning version number 0 a few DOS executables
> CPQ0TD.DRV E30ODI.COM IBM0MONO.DRV are skipped because first item was
> "too low". Then a few  Microsoft Event Trace Logs (DlTel-Merge.etl
> boot_BASE+CSWITCH_1.etl  UpdateUx.006.etl) are skipped because of
> invalid "high" 1st item and then the sub routine is called. So at the
> moment this looks like:
>>>>>>>>>>>> 512	ubyte		>037
>>>>>>>>>>>>> 512 ubyte		<0377
>>>>>>>>>>>>>> 0	use		dbase3-memo-print
> 
> Unfortunately at this position this also true for some Commodore 64
> Art Studio (Deep_Strike.aas dragon's_lair_ii.aas), some Atari DEGAS
> Elite bitmap (ELEPHANT.PC3 ST.PC2), some probably old GRUB modules
> (part_sun.mod pcidump.mod) and virtual-boy-wario-land.vb.
> Unfortunately i can not be more restrictive in the test for "invalid
> high" because German umlaut ue is encoded as octal 374 "high".
> 
> So like in other branch i look for memo field terminating Control-Z
> character. This is done by line like:
>>>>>>>>>>>>>> 513 search/523	\032
> Now most misidentified samples vanished. But there i found one
> exception. For old GRUB module pcidump.mod this is also true. So i
> tried to look for second terminating character. For most real DBT
> samples this Ctrl-Z again ( like in dbase3dbt0_1.dbt dbase_83.dbt),
> where as for GRUB module this was a nil byte. So these are matched
> by lines like:
>>>>>>>>>>>>>>> &0 ubyte		032
>>>>>>>>>>>>>>>> 0 use		dbase3-memo-print
> This is very unlikely but can also nil byte as second terminating
> character can occur for real DBT samples (like fsadress.dbt
> umlaut-dbf-cmd.dbt). In the GRUB module the first specific ASCII
> like phrase at offset 780 (=30Ch) is:
> 	pcidump\0Show raw dump of the PCI configuration space
> Unfortunately i can not skip this module by testing for unequal this
> phrase, because than short DBT samples are missed. So i search for
> parts of this sentence and if this fails i get DBT samples by
> default clause. So the last additional test before calling sub routin
> e
> for samples (like fsadress.dbt umlaut-dbf-cmd.dbt) now becomes like:
>>>>>>>>>>>>>>> &0 ubyte		0
>>>>>>>>>>>>>>>> 514 search/0x11E	pcidump\0Show
>>>>>>>>>>>>>>>> 514 default	x
>>>>>>>>>>>>>>>>> 0 use		dbase3-memo-print
> 
> After applying the above mentioned modifications by patch
> file-5.44-database-dbt.diff then all concerned samples are not
> misidentified as DBT any more and real DBT samples are still
> recognized. This now looks like:
> 
> Deep_Strike.aas:                    data
> ELEPHANT.PC3:                       Atari DEGAS Elite
> 				    compressed bitmap 640 x 400 x 2
> 				    , color palette
> 				    0000 0777 0000 0000 0000 ...
> ST.PC2:                             Atari DEGAS Elite
> 				    compressed bitmap 640 x 200 x 4
> 				    , color palette
> 				    0000 0777 0000 0000 0000 ...
> StateRepository-Deployment.srd-shm: data
> dbase-memo.dbt:                     dBase III DBT
> 				    , next free block index 2
> 				    , 1st item "1st memo \032"
> dbase3dbt0_1.dbt:                   dBase III DBT, version number 0
> 				    , next free block index 2
> 				    , 1st item "1st memo. test umlaut
> 				    with cp 1252:
> 				    \344=ae, \366=oe, \374=ue,
> 				    \337=ss,\200=euro,
> 				    \304=Ae, \326=Oe,
> 				    \334=Ue\032\032"
> dbase_83.dbt:                       dBase III DBT, version number 0
> 				    , next free block index 79
> 				    , 1st item "Our Original
> 				    assortment...a little taste
> dragon's_lair_ii.aas:               data
> fsadress.dbt:                       dBase III DBT, version number 0
> 				    , next free block index 5
> 				    , 1st item "This is a note for
> 				    Karl M\374ller. "
> gcry_cast5.mod:                     data
> keylayouts.mod:                     data
> nativedisk.mod:                     data
> part_sun.mod:                       data
> pcidump.mod:                        data
> plan9.mod:                          data
> test.dbt:                           dBase III DBT
> 				    , next free block index 16
> 				    , 1st item "WHAT IS XBASE"
> virtual-boy-wario-land.vb:          data
> 
> I hope my diff file can be applied in future version of file
> utility.
> 
> 
> With best wishes,
> Jörg Jenderek
> - --
> Jörg Jenderek
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
> 
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCY7qdjgAKCRCv8rHJQhrU
> 1qG7AJ9zATXBUFAFYQ84kUdwQWBInNmCOACfYQj0i6n8pUpqSfqkB6qZT5d4tCw=
> =hc/n
> -----END PGP SIGNATURE-----
> <trid-v-dbt.txt.gz><file-5_44-database-dbt_diff.DEFANGED-624548><file-5_44-database-dbt_diff_sig.DEFANGED-624549>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20230111/c2a9dbc4/attachment.asc>


More information about the File mailing list