[File] [PATCH] Magdir/database dBase III DBT misidentifies some Atari DEGAS bitmaps, SQLite Write-Ahead Log shared memory
Jörg Jenderek
joerg.jen.der.ek at gmx.net
Sun Jan 8 10:40:14 UTC 2023
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello,
Some days ago i send patch for SQLite Write-Ahead Log shared memory
files.
When running file command version 5.44 with -k option on such
examples and more misidentified samples i get an output like:
Deep_Strike.aas: dBase III DBT, version number 0
, next free block index 8192
, 1st item "\374\374"
ELEPHANT.PC3: Atari DEGAS Elite
compressed bitmap 640 x 400 x 2
, color palette
0000 0777 0000 0000 0000 ...
dBase III DBT, version number 0
, next free block index 640
, 1st item "\351\377\003\376"
ST.PC2: Atari DEGAS Elite
compressed bitmap 640 x 200 x 4
, color palette
0000 0777 0000 0000 0000 ...
dBase III DBT, version number 0
, next free block index 384
, 1st item "\341\377\261"
StateRepository-Deployment.srd-shm: dBase III DBT
, next free block index 3007000
, block length 6144
dbase-memo.dbt: dBase III DBT
, next free block index 2
, 1st item "1st memo \032"
dbase3dbt0_1.dbt: dBase III DBT, version number 0
, next free block index 2
1st item "1st memo. test umlaut
with cp 1252:
\344=ae, \366=oe, \374=ue,
\337=ss,\200=euro,
\304=Ae, \326=Oe,
\334=Ue\032\032"
dbase_83.dbt: dBase III DBT, version number 0
, next free block index 79
, 1st item "Our Original
assortment...a little taste
dragon's_lair_ii.aas: dBase III DBT, version number 0
, next free block index 8192
, 1st item "\314\303\003\003
fsadress.dbt: dBase III DBT, version number 0
, next free block index 5
, 1st item "This is a note for
Karl M\374ller. "
gcry_cast5.mod: dBase III DBT
, next free block index 4
, 1st item "\001\010"
keylayouts.mod: dBase III DBT
, next free block index 24
, 1st item "rintf"
nativedisk.mod: dBase III DBT
, next free block index 10
, 1st item
"rub_file_get_device_name"
part_sun.mod: dBase III DBT, version number 0
, next free block index 100
, 1st item "LICENSE=GPLv3+"
pcidump.mod: dBase III DBT, version number 0
, next free block index 1
, 1st item "\203x\034"
plan9.mod: dBase III DBT
, next free block index 4
, 1st item "b_strcmp"
test.dbt: dBase III DBT
, next free block index 16
, 1st item "WHAT IS XBASE"
virtual-boy-wario-land.vb: dBase III DBT, version number 0
, next free block index 61440
, 1st item " \307\356\377\004"
For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). Correct here it
only identifies also the Atari DEGAS bitmap (*.PC2 *.PC3 See
appended trid-v-dbt.txt.gz).
For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/). This
identifies all DBT samples as "dBASE Text Memo" based on file name
suffix by PUID x-fmt/311.
Luckily in Magdir/database the displaying part is done by sub routine
dbase3-memo-print. So only additional tests must be done before
calling that routine.
For one DBT branch the output does not contain the phrase "version
number". That is for samples with standard version number 3. After
skipping many RAR by test for valid block sizes and then checking for
version number 3 the sub routine is called. This looks at the moment
like:
>>>>>20 ubelong&0xFF01209B 0x00000000
>>>>>>16 ubyte 3
>>>>>>>0 use dbase3-memo-print
For real DBT sample like dbase3dbt0_1.dbt the first item must be
something like printable ASCII string like "1st memo. test umlaut".
This is done in sub routine by line like:
>512 string >\0 \b, 1st item "%s"
In other branches i already skipped "bad" samples by checking first
item field. So i do here the same. So i skip samples with invalid
"low" 1st item field like "\0\0\0\0" in
StateRepository-Deployment.srd-shm and "\001\010\0\0" in
gcry_cast5.mod by additional test line
>>>>>>>512 ubyte >040
Unfortunately this also true for samples like keylayouts.mod,
nativedisk.mod and plan9.mod.
So i must look for more tests. When looking in DBT examples we see in
many examples at the end a byte sequence is displayed like
\032\032. So for debugging reason i activate at the end of
subroutine some lines like:
>513 search/0x225 \032 FOUND_TERMINATOR
>>&0 ubyte 032 2xCTRL_Z
>>&0 ubyte 0 1xCTRL_Z
Then we see what is written in the documentation. The item field
normally is terminated by 2 Control-Z characters. But in some
variants (FoxPro, Fox?? like fsadress.dbt) only one Control-Z
character is used. In my inspected example the next character was a
nil byte. Nothing is written about the size of memo field. But using
brain we can assume that this only some hundred characters. No human
will write a comment or note with thousand of characters. So use this
facts in concerned branch. So the second test part in this branch now
becomes like:
>>>>>>>513 search/3308 \032
By this test GRUB module keylayouts.mod is skipped.
Unfortunately at this point work is not done, because there exist
also DBT samples like dbase-memo.dbt where second terminating
character a nil byte. At this point this also true for old GRUB
module nativedisk.mod. There the first ASCII like phrase is
grub_mod_init at offset 429 (=1ADh). So skip GRUB module explicitly
by checking for that specific word. So last test part in that branch
now becomes like:
>>>>>>>>>&0 ubyte 0
>>>>>>>>>>0x1ad string !grub_mod_init
>>>>>>>>>>>0 use dbase3-memo-print
In another branch concerning version number 0 a few DOS executables
CPQ0TD.DRV E30ODI.COM IBM0MONO.DRV are skipped because first item was
"too low". Then a few Microsoft Event Trace Logs (DlTel-Merge.etl
boot_BASE+CSWITCH_1.etl UpdateUx.006.etl) are skipped because of
invalid "high" 1st item and then the sub routine is called. So at the
moment this looks like:
>>>>>>>>>>>512 ubyte >037
>>>>>>>>>>>>512 ubyte <0377
>>>>>>>>>>>>>0 use dbase3-memo-print
Unfortunately at this position this also true for some Commodore 64
Art Studio (Deep_Strike.aas dragon's_lair_ii.aas), some Atari DEGAS
Elite bitmap (ELEPHANT.PC3 ST.PC2), some probably old GRUB modules
(part_sun.mod pcidump.mod) and virtual-boy-wario-land.vb.
Unfortunately i can not be more restrictive in the test for "invalid
high" because German umlaut ue is encoded as octal 374 "high".
So like in other branch i look for memo field terminating Control-Z
character. This is done by line like:
>>>>>>>>>>>>>513 search/523 \032
Now most misidentified samples vanished. But there i found one
exception. For old GRUB module pcidump.mod this is also true. So i
tried to look for second terminating character. For most real DBT
samples this Ctrl-Z again ( like in dbase3dbt0_1.dbt dbase_83.dbt),
where as for GRUB module this was a nil byte. So these are matched
by lines like:
>>>>>>>>>>>>>>&0 ubyte 032
>>>>>>>>>>>>>>>0 use dbase3-memo-print
This is very unlikely but can also nil byte as second terminating
character can occur for real DBT samples (like fsadress.dbt
umlaut-dbf-cmd.dbt). In the GRUB module the first specific ASCII
like phrase at offset 780 (=30Ch) is:
pcidump\0Show raw dump of the PCI configuration space
Unfortunately i can not skip this module by testing for unequal this
phrase, because than short DBT samples are missed. So i search for
parts of this sentence and if this fails i get DBT samples by
default clause. So the last additional test before calling sub routin
e
for samples (like fsadress.dbt umlaut-dbf-cmd.dbt) now becomes like:
>>>>>>>>>>>>>>&0 ubyte 0
>>>>>>>>>>>>>>>514 search/0x11E pcidump\0Show
>>>>>>>>>>>>>>>514 default x
>>>>>>>>>>>>>>>>0 use dbase3-memo-print
After applying the above mentioned modifications by patch
file-5.44-database-dbt.diff then all concerned samples are not
misidentified as DBT any more and real DBT samples are still
recognized. This now looks like:
Deep_Strike.aas: data
ELEPHANT.PC3: Atari DEGAS Elite
compressed bitmap 640 x 400 x 2
, color palette
0000 0777 0000 0000 0000 ...
ST.PC2: Atari DEGAS Elite
compressed bitmap 640 x 200 x 4
, color palette
0000 0777 0000 0000 0000 ...
StateRepository-Deployment.srd-shm: data
dbase-memo.dbt: dBase III DBT
, next free block index 2
, 1st item "1st memo \032"
dbase3dbt0_1.dbt: dBase III DBT, version number 0
, next free block index 2
, 1st item "1st memo. test umlaut
with cp 1252:
\344=ae, \366=oe, \374=ue,
\337=ss,\200=euro,
\304=Ae, \326=Oe,
\334=Ue\032\032"
dbase_83.dbt: dBase III DBT, version number 0
, next free block index 79
, 1st item "Our Original
assortment...a little taste
dragon's_lair_ii.aas: data
fsadress.dbt: dBase III DBT, version number 0
, next free block index 5
, 1st item "This is a note for
Karl M\374ller. "
gcry_cast5.mod: data
keylayouts.mod: data
nativedisk.mod: data
part_sun.mod: data
pcidump.mod: data
plan9.mod: data
test.dbt: dBase III DBT
, next free block index 16
, 1st item "WHAT IS XBASE"
virtual-boy-wario-land.vb: data
I hope my diff file can be applied in future version of file
utility.
With best wishes,
Jörg Jenderek
- --
Jörg Jenderek
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCY7qdjgAKCRCv8rHJQhrU
1qG7AJ9zATXBUFAFYQ84kUdwQWBInNmCOACfYQj0i6n8pUpqSfqkB6qZT5d4tCw=
=hc/n
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-dbt.txt.gz
Type: application/x-gzip
Size: 1622 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230108/7581371a/attachment.bin>
-------------- next part --------------
--- file-5.44/magic/Magdir/database.old 2022-09-28 17:46:04.000000000 +0200
+++ file-5.44/magic/Magdir/database 2023-01-08 10:57:57.054153100 +0100
@@ -389,4 +389,18 @@
>>>>>>16 ubyte 3
-# dBASE III DBT
->>>>>>>0 use dbase3-memo-print
+# skip with invalid "low" 1st item "\0\0\0\0" StateRepository-Deployment.srd-shm "\001\010\0\0" gcry_cast5.mod
+>>>>>>>512 ubyte >040
+# skip with valid 1st item "rintf" keylayouts.mod
+# by looking for valid terminating character Ctrl-Z like in test.dbt
+>>>>>>>>513 search/3308 \032
+# skip GRUB plan9.mod with invalid second terminating character 007
+# by checking second terminating character Ctrl-Z like in test.dbt
+>>>>>>>>>&0 ubyte 032
+# dBASE III DBT with two Ctr-Z terminating characters
+>>>>>>>>>>0 use dbase3-memo-print
+# second terminating character \0 like in dbase-memo.dbt or GRUB nativedisk.mod
+>>>>>>>>>&0 ubyte 0
+# skip GRUB nativedisk.mod with grub_mod_init\0grub_mod_fini\0grub_fs_autoload_hook\0
+>>>>>>>>>>0x1ad string !grub_mod_init
+# like dbase-memo.dbt
+>>>>>>>>>>>0 use dbase3-memo-print
# dBASE III DBT without version, dBASE IV DBT , FoxPro FPT , or many ZIP , DBF garbage
@@ -414,4 +428,19 @@
>>>>>>>>>>>>512 ubyte <0377
-# unusual dBASE III DBT like adressen.dbt biblio.dbt fsadress.dbt
->>>>>>>>>>>>>0 use dbase3-memo-print
+# skip some Commodore 64 Art Studio (Deep_Strike.aas dragon's_lair_ii.aas), some Atari DEGAS Elite bitmap (ELEPHANT.PC3 ST.PC2)
+# some probably old GRUB modules (part_sun.mod) and virtual-boy-wario-land.vb.
+# by looking for valid terminating character Ctrl-Z
+>>>>>>>>>>>>>513 search/523 \032
+# Atari DEGAS bitmap ST.PC2 with 0370 as second terminating character
+#>>>>>>>>>>>>>>&0 ubyte x 2ND_CHAR_IS=%o
+# dBASE III DBT with two Ctr-Z terminating characters like dbase3dbt0_1.dbt dbase_83.dbt
+>>>>>>>>>>>>>>&0 ubyte 032
+>>>>>>>>>>>>>>>0 use dbase3-memo-print
+# second terminating character \0 like in pcidump.mod or fsadress.dbt umlaut-dbf-cmd.dbt
+>>>>>>>>>>>>>>&0 ubyte 0
+# look for old GRUB module pcidump.mod with specific content "pcidump\0Show raw dump of the PCI configuration space"
+>>>>>>>>>>>>>>>514 search/0x11E pcidump\0Show
+# dBASE III DBT with Ctr-Z + \0 terminating characters like fsadress.dbt
+>>>>>>>>>>>>>>>514 default x
+# unusual dBASE III DBT like fsadress.dbt umlaut-dbf-cmd.dbt
+>>>>>>>>>>>>>>>>0 use dbase3-memo-print
# dBASE III DBT like angest.dbt, or garbage PCX DBF
@@ -462,3 +491,3 @@
>20 uleshort !0 \b, block length %u
-# dBase III memo field terminated by \032\032
+# dBase III memo field terminated often by \032\032
# like: "WHAT IS XBASE" test.dbt "Borges, Malte" biblio.dbt "First memo\032\032" T2.DBT
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.44-database-dbt.diff.sig
Type: application/octet-stream
Size: 1267 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230108/7581371a/attachment.obj>
More information about the File
mailing list