[File] [PATCH] Magdir/database dBase III DBT misidentifies some Microsoft Event Trace Logs *.ETL
Jörg Jenderek
joerg.jen.der.ek at gmx.net
Thu Sep 22 15:58:18 UTC 2022
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello,
Some days ago i run the cleaning tool czkawka found on
https://qarmin.github.io/czkawka/. One menu item concerns bad
extensions. After running tool i looked in saved file list
results_bad_extensions.txt for bad extension examples.
One listed extension is ETL.
These files are Microsoft Event Trace Logs. When running file
command version 5.43 on some other ETL examples and related positive
DBT samples i get an output like:
DlTel-Merge.etl: dBase III DBT, version number 0,
next free block index 65536, 1st item
"\377\377\377\377\377\377\377\377\377
UpdateUx.006.etl: dBase III DBT, version number 0,
next free block index 4096, 1st item
"\377\377\377\377\377\377\377\377\377
WBEngine.3.etl: dBase III DBT, version number 0,
next free block index 10240, 1st item
"9600.18730.amd64fre.winblue_ltsb.170613-0600"
Wifi.etl: dBase III DBT, version number 0,
next free block index 81920, 1st item
"10586.494.amd64fre.th2_release_sec.160630-1736"
adressen.dbt: dBase III DBT, version number 0,
next free block index 3, 1st item
"2NDemail:user12 at localhost.local\032\032"
angest.dbt: dBase III DBT, version number 0,
next free block index 9, 1st item
"Sport: Bogenschie\341en"
biblio.dbt: dBase III DBT, version number 0,
next free block index 1194, 1st item
"Borges, Malte; Schumacher, J\303\266rg;
Redeker, Torsten\032\032"
dbase3dbt0_1.dbt: dBase III DBT, version number 0,
next free block index 2, 1st item
"1st memo. test umlaut with cp 1252: \
344=ae, \366=oe, \374=ue, \337=ss,\200=euro,
\304=Ae, \326=Oe, \334=Ue\032\032"
fsadress.dbt: dBase III DBT, version number 0,
next free block index 5, 1st item
"This is a note for Karl M\374ller. "
For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). This does not
recognise DBT samples, but i recognizes many ETL samples ( See
appended trid-v-etl_dbt.txt.gz).
For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/). This
identifies all DBT samples as "dBASE Text Memo" based on file name
suffix by PUID x-fmt/311.
Luckily in Magdir/database the displaying part is done by sub routine
dbase3-memo-print. So only additional test must be done before
calling that routine.
One DBT branch is for samples (like adressen.dbt biblio.dbt
fsadress.dbt). This looks at the moment like:
>>>>>>>>>>513 ubyte >037
>>>>>>>>>>>512 ubyte >037
>>>>>>>>>>>>0 use dbase3-memo-print
For real DBT samples the first item must be something like printable
ASCII string. By last test line some DOS executables ( like
CPQ0TD.DRV E30ODI.COM IBM0MONO.DRV) are skipped by looking for
printable first character of 1st item is not "too low". By test
before bad sample (like WinStore.App.exe) is skipped by looking for
printable second character of first item is not "too low".
For few (14/758) Microsoft Event Trace Logs (like
boot_BASE+CSWITCH_1.etl DlTel-Merge.etl UpdateUx.006.etl) this field
interpreted gives invalid "high" 1st item starting with byte sequence
\377\377. So bad ETL samples are now skipped by additional test for
"low enough" first item name. This part now becomes like:
>>>>>>>>>>513 ubyte >037
>>>>>>>>>>>512 ubyte >037
>>>>>>>>>>>>512 ubyte <0377
>>>>>>>>>>>>>0 use dbase3-memo-print
There is another branch handling samples like angest.dbt. This looks
like:
>>>>>>>>>>>512 ubyte >037
>>>>>>>>>>>>512 ubyte <0200
>>>>>>>>>>>>>513 ubyte >037
>>>>>>>>>>>>>>0 use dbase3-memo-print
There a test for "too high first character was already used. By
this step some Microsoft Visual C, OMF libraries (like: BZ2.LIB
WATTCPWL.LIB ZLIB.LIB) are skipped. Unfortunately in this brunch
few (8/758) Microsoft Event Trace Logs (like WBEngine.3.etl
Wifi.etl) are still misidentified. And when we look at first item
name these are strange but valid. These names look like:
"9600.20369.amd64fre.winblue_ltsb_escrow.220427-1727"
"9600.19846.amd64fre.winblue_ltsb_escrow.200923-1735"
"10586.494.amd64fre.th2_release_sec.160630-1736"
So i must look for another test. When looking in DBT examples we see
in many examples at the end a byte sequence is displayed like
\032\032. So for debugging reason add at the end of subroutine some
lines like:
>513 search/0x225 \032 FOUND_TERMINATOR
>>&0 ubyte 032 2xCTRL_Z
>>&0 ubyte 0 1xCTRL_Z
Then we see what is written in the documentation. The item field
normally is terminated by 2 Control-Z characters. But in some
variants (FoxPro, Fox?? like fsadress.dbt) only one Control-Z
character is used. In my inspected example next character was a nil
byte. Nothing is written about the size of memo field. But using
brain we can assume that this only some hundred characters. No human
will write a comment or note with thousand of characters. So use this
facts in concerned branch. This now becomes like:
>>>>>>>>>>>>>513 ubyte >037
>>>>>>>>>>>>>>513 search/0x11E \032
>>>>>>>>>>>>>>>&0 ubyte 032
>>>>>>>>>>>>>>>>0 use dbase3-memo-print
>>>>>>>>>>>>>>>&0 ubyte 0
>>>>>>>>>>>>>>>>0 use dbase3-memo-print
After applying the above mentioned modifications by patch
file-5.43-database-etl.diff then all concerned ETL samples are not
misidentified as DBT any more and real DBT samples are still
recognized. This now looks like:
DlTel-Merge.etl: data
UpdateUx.006.etl: data
WBEngine.3.etl: data
Wifi.etl: data
adressen.dbt: dBase III DBT, version number 0,
next free block index 3, 1st item
"2NDemail:user12 at localhost.local\032\032"
angest.dbt: dBase III DBT, version number 0,
next free block index 9, 1st item
"Sport: Bogenschie\341en"
biblio.dbt: dBase III DBT, version number 0,
next free block index 1194, 1st item
"Borges, Malte; Schumacher, J\303\266rg;
Redeker, Torsten\032\032"
dbase3dbt0_1.dbt: dBase III DBT, version number 0,
next free block index 2, 1st item
"1st memo. test umlaut with cp 1252:
\344=ae, \366=oe, \374=ue, \337=ss,\200=euro,
\304=Ae, \326=Oe, \334=Ue\032\032"
fsadress.dbt: dBase III DBT, version number 0,
next free block index 5, 1st item
"This is a note for Karl M\374ller. "
I hope my diff file can be applied in future version of file
utility.
Unfortunately more ETL samples are misidentified and description of
ETL itself is missing. I will try to do this in a future session.
With best wishes,
Jörg Jenderek
- --
Jörg Jenderek
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYyyGGQAKCRCv8rHJQhrU
1mdiAJ9xkBS5Lj8E8hgFDFouX+zkr/4E5gCZATGzFeku3ON6qA+AfoOpXTsthOo=
=nsOc
-----END PGP SIGNATURE-----
-------------- next part --------------
--- file-5.43/magic/Magdir/database.old 2022-08-16 13:15:19.000000000 +0200
+++ file-5.43/magic/Magdir/database 2022-09-22 17:35:29.503078500 +0200
@@ -407,27 +407,41 @@
# skip WORD1XW.DOC with improbably high free block index
>>>>>>>>>0 ulelong <0x400000
# skip WinStore.App.exe by looking for printable 2nd character of 1st memo item
>>>>>>>>>>513 ubyte >037
# skip DOS executables CPQ0TD.DRV E30ODI.COM IBM0MONO.DRV by looking for printable 1st character of 1st memo item
>>>>>>>>>>>512 ubyte >037
-# unusual dBASE III DBT like adressen.dbt
->>>>>>>>>>>>0 use dbase3-memo-print
+# skip few (14/758) Microsoft Event Trace Logs (boot_BASE+CSWITCH_1.etl DlTel-Merge.etl UpdateUx.006.etl) with invalid "high" 1st item \377\377
+>>>>>>>>>>>>512 ubyte <0377
+# unusual dBASE III DBT like adressen.dbt biblio.dbt fsadress.dbt
+>>>>>>>>>>>>>0 use dbase3-memo-print
# dBASE III DBT like angest.dbt, or garbage PCX DBF
>>>>>>>>8 ubelong !0
# skip PCX and some DBF by test for for reserved NULL bytes
>>>>>>>>>510 ubeshort 0
# skip bad symples with improbably high free block index above 2 GiB file limit
>>>>>>>>>>0 ulelong <0x400000
# skip AI070GEP.EPS by printable 1st character of 1st memo item
>>>>>>>>>>>512 ubyte >037
# skip some Microsoft Visual C, OMF library like: BZ2.LIB WATTCPWL.LIB ZLIB.LIB
>>>>>>>>>>>>512 ubyte <0200
# skip gluon-ffhat-1.0-tp-link-tl-wr1043n-nd-v2-sysupgrade.bin by printable 2nd character
>>>>>>>>>>>>>513 ubyte >037
->>>>>>>>>>>>>>0 use dbase3-memo-print
+# skip few (8/758) Microsoft Event Trace Logs (WBEngine.3.etl Wifi.etl) with valid 1st item like
+# "9600.20369.amd64fre.winblue_ltsb_escrow.220427-1727"
+# "9600.19846.amd64fre.winblue_ltsb_escrow.200923-1735"
+# "10586.494.amd64fre.th2_release_sec.160630-1736"
+# by looking for valid terminating character Ctrl-Z
+>>>>>>>>>>>>>>513 search/0x11E \032
+# followed by second character Ctrl-Z implies typical DBT
+>>>>>>>>>>>>>>>&0 ubyte 032
+# examples like: angest.dbt
+>>>>>>>>>>>>>>>>0 use dbase3-memo-print
+>>>>>>>>>>>>>>>&0 ubyte 0
+# no example found here with terminating sequence CTRL-Z + \0
+>>>>>>>>>>>>>>>>0 use dbase3-memo-print
# dBASE IV DBT with positive block size
>>>>>>>20 uleshort >0
# dBASE IV DBT with valid block length like 512, 1024
# multiple of 2 in between 16 and 16 K ,implies upper and lower bits are zero
# skip also 3600h 3E00h size
>>>>>>>>20 uleshort&0xE00f 0
@@ -448,12 +462,17 @@
>20 uleshort !0 \b, block length %u
# dBase III memo field terminated by \032\032
# like: "WHAT IS XBASE" test.dbt "Borges, Malte" biblio.dbt "First memo\032\032" T2.DBT
>512 string >\0 \b, 1st item "%s"
# For DEBUGGING
#>512 ubelong x \b, 1ST item %#8.8x
+#>513 search/0x225 \032 FOUND_TERMINATOR
+#>>&0 ubyte 032 2xCTRL_Z
+# fsadress.dbt has 1 Ctrl-Z terminator followed by nil byte
+#>>&0 ubyte 0 1xCTRL_Z
+
# https://www.clicketyclick.dk/databases/xbase/format/dbt.html
# Print the information of dBase IV DBT memo file
0 name dbase4-memo-print
>0 lelong x dBase IV DBT
!:mime application/x-dbt
!:ext dbt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.43-database-etl.diff.sig
Type: application/octet-stream
Size: 1597 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220922/85dcd7ed/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-etl_dbt.txt.gz
Type: application/x-gzip
Size: 1395 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220922/85dcd7ed/attachment.bin>
More information about the File
mailing list