[File] [PATCH] of Magdir/database for dBase III DBT, version number 0
Jörg Jenderek
joerg.jen.der.ek at gmx.net
Mon Mar 23 23:12:53 UTC 2020
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello,
some months ago i send patches to handle dBase database files. In the
mean time i gathered some examples that are also misidentified as
"dBase III DBT, version number 0". When running file command version
5.38 on such misidentified samples with -e cdf and -m Magdir/database
options i get an output like:
AI070GEP.EPS:
dBase III DBT, version number 0, next free block index
458766
gluon-ffhat-0.9.4.8-tp-link-tl-wr1043n-nd-v1-sysupgrade.bin:
dBase III DBT, version number 0, next free block index
1, 1st item "\037 \010"
gluon-ffhat-1.0-tp-link-tl-wr1043n-nd-v1-sysupgrade.bin:
dBase III DBT, version number 0, next free block index
1, 1st item "\037 \010"
gluon-ffhat-1.0-tp-link-tl-wr1043n-nd-v2-sysupgrade.bin:
dBase III DBT, version number 0, next free block index
1, 1st item "m"
planmaker-pmd-2010.pmd:
dBase III DBT, version number 0, next free block index
3759263696, 1st item " \010\020"
planmaker-pmd-2012.pmd:
dBase III DBT, version number 0, next free block index
3759263696, 1st item " \010\020"
planmaker-pmv.pmv:
dBase III DBT, version number 0, next free block index
3759263696, 1st item " \010\020"
planmaker-xls-5.0-7.0.xls:
dBase III DBT, version number 0, next free block index
3759263696, 1st item " \010\010"
planmaker-xls-97-2003.xls:
dBase III DBT, version number 0, next free block index
3759263696, 1st item " \010\020"
planmaker-xlt.xlt:
dBase III DBT, version number 0, next free block index
3759263696, 1st item " \010\020"
Sammlung.wsb:
dBase III DBT, version number 0, next free block index
3759263696, 1st item " \004"
sm-presentation-pot.pot:
dBase III DBT, version number 0, next free block index
3759263696
sm-presentation-pps.pps:
dBase III DBT, version number 0, next free block index
3759263696
sm-presentation-ppt-2000-2003.ppt:
dBase III DBT, version number 0, next free block index
3759263696
sm-presentation-ppt-97.ppt:
dBase III DBT, version number 0, next free block index
3759263696
sm-presentation-prd.prd:
dBase III DBT, version number 0, next free block index
3759263696
sm-presentation-prv.prv:
dBase III DBT, version number 0, next free block index
759263696
softmaker-doc-6.0-95.doc:
dBase III DBT, version number 0, next free block index
3759263696, 1st item " h"
softmaker-doc-97-2003.doc:
dBase III DBT, version number 0, next free block index
3759263696, 1st item " \001\001U@ \004"
softmaker-dot-6.0-95.dot:
dBase III DBT, version number 0, next free block index
3759263696, 1st item " h"
softmaker-dot-97-2003.dot:
dBase III DBT, version number 0, next free block index
3759263696, 1st item " \001\001U@ \004"
WinStore.App.exe:
dBase III DBT, version number 0, next free block index
23117
WORD1XW.DOC:
dBase III DBT, version number 0, next free block index
2205083, 1st item "\200\001"
Unfortunately DBT files have no real good characteristic magic byte
sequence. But luckily the displaying part for such dBase files is
encapsulated by sub routine dbase3-memo-print inside
Magdir/database. So only magic lines for test condition must be
changed or added.
For misidentified samples exorbitant high values for next free
block index are shown, where for real world examples only "low"
occur.
In the documentations about dBase is mentioned that the upper limit
for the dBase database is 2 GiB. In the memo file a block size of
512 is used. That means that index values are below hexadecimal
value 0x400000. The values are stored as 4 byte long integer in
little endian. Nothing is explicitly said about the sign of values,
but negative values make no sense for real block index number. So
type ulelong instead must lelong be used in lines with comparisons,
or otherwise high values like decimal 3759263696 are considered as
negative values.
Furthermore for real word examples the first memo item is longer
(more than 2 characters) printable ASCII text. That means byte
value of characters is equal or higher than hexadecimal 0x20 or
octal 040. That is value of the space character.
So one test branch looks like:
>>>>>>>>>0 lelong <2205083
>>>>>>>>>>0 use dbase3-memo-print
To skip samples like WORD1XW.DOC with improbably high free block
index and samples like WinStore.App.exe with unprintable second
character of first memo item field this now becomes
>>>>>>>>>0 ulelong <0x400000
>>>>>>>>>>513 ubyte >037
>>>>>>>>>>>0 use dbase3-memo-print
An other test branch looks like
>>>>>>>>>>0 lelong <458766
>>>>>>>>>>>0 use dbase3-memo-print
So skip bad samples with improbably high free block index or non
printable first or second character of memo field by changed test
lines. This now becomes:
>>>>>>>>>>0 ulelong <0x400000
>>>>>>>>>>>512 ubyte >037
>>>>>>>>>>>>513 ubyte >037
>>>>>>>>>>>>>0 use dbase3-memo-print
After applying the above mentioned modifications by patch
file-5.38-database-dbt.diff then the above mentioned examples are
not misidentified any more as "dBase III DBT, version number 0" and
for real DBT files i still get correct output like
dbase3dbt0.dbt: dBase III DBT, version number 0,
next free block index 3,
1st item "1st memo text\032\032"
dbase3dbt0_1.dbt: dBase III DBT, version number 0,
next free block index 2,
1st item "1st memo. test umlaut with cp 1252:
ä=ae, ö=oe, ü=ue, ß=ss,\200=euro, Ä=Ae, Ö=Oe, Ü=Ue\032\032
dbase3dbt0_4.dbt: dBase III DBT, version number 0,
next free block index 2,
1st item "first memo\032\032"
fsadress.dbt: dBase III DBT, version number 0,
next free block index 5,
1st item "This is a note for Karl Müller. "
I hope that now test lines for such DBT files are sufficient and that
my diff file can be applied in future version of file utility.
With best wishes
Jörg Jenderek
- --
Jörg Jenderek
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCXnlCVQAKCRCv8rHJQhrU
1uKzAKCxWEReTbX3HsEfiwSXlIvYYybFwQCgvnpQesjsCDPkqhhrBpUEqG0mgY8=
=dAHI
-----END PGP SIGNATURE-----
-------------- next part --------------
--- file-5.38/magic/Magdir/database.old 2019-06-19 16:18:55 +0000
+++ file-5.38/magic/Magdir/database 2020-03-23 22:29:23 +0000
@@ -360,14 +360,20 @@
# dBASE III DBT , garbage
# skip WORD1XW.DOC with improbably high free block index
->>>>>>>>>0 lelong <2205083
+>>>>>>>>>0 ulelong <0x400000
+# skip WinStore.App.exe by looking for printable 2nd character of 1st memo item
+>>>>>>>>>>513 ubyte >037
# unusual dBASE III DBT like adressen.dbt
->>>>>>>>>>0 use dbase3-memo-print
+>>>>>>>>>>>0 use dbase3-memo-print
# dBASE III DBT like angest.dbt, or garbage PCX DBF
>>>>>>>>8 ubelong !0
# skip PCX and some DBF by test for for reserved NULL bytes
>>>>>>>>>510 ubeshort 0
-# skip AI070GEP.EPS with improbably high free block index
->>>>>>>>>>0 lelong <458766
->>>>>>>>>>>0 use dbase3-memo-print
+# skip bad symples with improbably high free block index above 2 GiB file limit
+>>>>>>>>>>0 ulelong <0x400000
+# skip AI070GEP.EPS by printable 1st character of 1st memo item
+>>>>>>>>>>>512 ubyte >037
+# skip gluon-ffhat-1.0-tp-link-tl-wr1043n-nd-v2-sysupgrade.bin by printable 2nd character
+>>>>>>>>>>>>513 ubyte >037
+>>>>>>>>>>>>>0 use dbase3-memo-print
# dBASE IV DBT with positive block size
>>>>>>>20 uleshort >0
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.38-database-dbt.diff.sig
Type: application/octet-stream
Size: 95 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20200324/229c5080/attachment.obj>
More information about the File
mailing list