[File] [PATCH] Magdir/database dBase III DBT misidentifies some Microsoft Event Trace Logs *.ETL

Jörg Jenderek joerg.jen.der.ek at gmx.net
Thu Sep 22 15:58:18 UTC 2022


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

Some days ago i run the cleaning tool czkawka found on
https://qarmin.github.io/czkawka/. One menu item concerns bad
extensions. After running tool i looked in saved file list
results_bad_extensions.txt for bad extension examples.
One listed extension is ETL.

These files are Microsoft Event Trace Logs. When running file
command version 5.43 on some other ETL examples and related positive
DBT samples i get an output like:

DlTel-Merge.etl:  dBase III DBT, version number 0,
		  next free block index 65536, 1st item
		  "\377\377\377\377\377\377\377\377\377
UpdateUx.006.etl: dBase III DBT, version number 0,
		  next free block index 4096, 1st item
		  "\377\377\377\377\377\377\377\377\377
WBEngine.3.etl:   dBase III DBT, version number 0,
		  next free block index 10240, 1st item
		  "9600.18730.amd64fre.winblue_ltsb.170613-0600"
Wifi.etl:         dBase III DBT, version number 0,
		  next free block index 81920, 1st item
		  "10586.494.amd64fre.th2_release_sec.160630-1736"
adressen.dbt:     dBase III DBT, version number 0,
		  next free block index 3, 1st item
		  "2NDemail:user12 at localhost.local\032\032"
angest.dbt:       dBase III DBT, version number 0,
		  next free block index 9, 1st item
		  "Sport: Bogenschie\341en"
biblio.dbt:       dBase III DBT, version number 0,
		  next free block index 1194, 1st item
		  "Borges, Malte; Schumacher, J\303\266rg;
		  Redeker, Torsten\032\032"
dbase3dbt0_1.dbt: dBase III DBT, version number 0,
		  next free block index 2, 1st item
		  "1st memo. test umlaut with cp 1252: \
		  344=ae, \366=oe, \374=ue, \337=ss,\200=euro,
		  \304=Ae, \326=Oe, \334=Ue\032\032"
fsadress.dbt:     dBase III DBT, version number 0,
		  next free block index 5, 1st item
		  "This is a note for Karl M\374ller. "

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). This does not
recognise DBT samples, but i recognizes many ETL samples ( See
appended trid-v-etl_dbt.txt.gz).

For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/). This
identifies all DBT samples as "dBASE Text Memo" based on file name
suffix by PUID x-fmt/311.

Luckily in Magdir/database the displaying part is done by sub routine
dbase3-memo-print. So only additional test must be done before
calling that routine.

One DBT branch is for samples (like adressen.dbt biblio.dbt
fsadress.dbt). This looks at the moment like:
 >>>>>>>>>>513	ubyte		>037
 >>>>>>>>>>>512	ubyte		>037
 >>>>>>>>>>>>0	use		dbase3-memo-print
For real DBT samples the first item must be something like printable
ASCII string. By last test line some DOS executables ( like
CPQ0TD.DRV E30ODI.COM IBM0MONO.DRV) are skipped by looking for
printable first character of 1st item is not "too low". By test
before bad sample (like WinStore.App.exe) is skipped by looking for
printable second character of first item is not "too low".
For few (14/758) Microsoft Event Trace Logs (like
boot_BASE+CSWITCH_1.etl DlTel-Merge.etl UpdateUx.006.etl) this field
interpreted gives invalid "high" 1st item starting with byte sequence
\377\377. So bad ETL samples are now skipped by additional test for
"low enough" first item name. This part now becomes like:
 >>>>>>>>>>513		ubyte		>037
 >>>>>>>>>>>512		ubyte		>037
 >>>>>>>>>>>>512	ubyte		<0377
 >>>>>>>>>>>>>0	use		dbase3-memo-print

There is another branch handling samples like angest.dbt. This looks
like:
 >>>>>>>>>>>512		ubyte		>037
 >>>>>>>>>>>>512	ubyte		<0200
 >>>>>>>>>>>>>513 	ubyte		>037
 >>>>>>>>>>>>>>0	use		dbase3-memo-print
There a test for "too high first character was already used. By
this step some Microsoft Visual C, OMF libraries (like: BZ2.LIB
WATTCPWL.LIB ZLIB.LIB) are skipped. Unfortunately in this brunch
few (8/758) Microsoft Event Trace Logs (like WBEngine.3.etl
Wifi.etl) are still misidentified. And when we look at first item
name these are strange but valid. These names look like:
	"9600.20369.amd64fre.winblue_ltsb_escrow.220427-1727"
	"9600.19846.amd64fre.winblue_ltsb_escrow.200923-1735"
	"10586.494.amd64fre.th2_release_sec.160630-1736"

So i must look for another test. When looking in DBT examples we see
in many examples at the end a byte sequence is displayed like
\032\032. So for debugging reason add at the end of subroutine some
lines like:
 >513	search/0x225		\032		FOUND_TERMINATOR
 >>&0	ubyte			032		2xCTRL_Z
 >>&0	ubyte			0		1xCTRL_Z
Then we see what is written in the documentation. The item field
normally is terminated by 2 Control-Z characters. But in some
variants (FoxPro, Fox?? like fsadress.dbt) only one Control-Z
character is used. In my inspected example next character was a nil
byte. Nothing is written about the size of memo field. But using
brain we can assume that this only some hundred characters. No human
will write a comment or note with thousand of characters. So use this
facts in concerned branch. This now becomes like:
 >>>>>>>>>>>>>513 ubyte		>037
 >>>>>>>>>>>>>>513 search/0x11E	\032
 >>>>>>>>>>>>>>>&0	ubyte	032
 >>>>>>>>>>>>>>>>0	use	dbase3-memo-print
 >>>>>>>>>>>>>>>&0	ubyte	0
 >>>>>>>>>>>>>>>>0	use	dbase3-memo-print

After applying the above mentioned modifications by patch
file-5.43-database-etl.diff then all concerned ETL samples are not
misidentified as DBT any more and real DBT samples are still
recognized. This now looks like:
DlTel-Merge.etl:  data
UpdateUx.006.etl: data
WBEngine.3.etl:   data
Wifi.etl:         data
adressen.dbt:     dBase III DBT, version number 0,
		  next free block index 3, 1st item
		  "2NDemail:user12 at localhost.local\032\032"
angest.dbt:       dBase III DBT, version number 0,
		  next free block index 9, 1st item
		  "Sport: Bogenschie\341en"
biblio.dbt:       dBase III DBT, version number 0,
		  next free block index 1194, 1st item
		  "Borges, Malte; Schumacher, J\303\266rg;
		  Redeker, Torsten\032\032"
dbase3dbt0_1.dbt: dBase III DBT, version number 0,
		  next free block index 2, 1st item
		  "1st memo. test umlaut with cp 1252:
		  \344=ae, \366=oe, \374=ue, \337=ss,\200=euro,
		  \304=Ae, \326=Oe, \334=Ue\032\032"
fsadress.dbt:     dBase III DBT, version number 0,
		  next free block index 5, 1st item
		  "This is a note for Karl M\374ller. "

I hope my diff file can be applied in future version of file
utility.

Unfortunately more ETL samples are misidentified and description of
ETL itself is missing. I will try to do this in a future session.

With best wishes,
Jörg Jenderek
- --
Jörg Jenderek
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYyyGGQAKCRCv8rHJQhrU
1mdiAJ9xkBS5Lj8E8hgFDFouX+zkr/4E5gCZATGzFeku3ON6qA+AfoOpXTsthOo=
=nsOc
-----END PGP SIGNATURE-----
-------------- next part --------------
--- file-5.43/magic/Magdir/database.old	2022-08-16 13:15:19.000000000 +0200
+++ file-5.43/magic/Magdir/database	2022-09-22 17:35:29.503078500 +0200
@@ -407,27 +407,41 @@
 # skip WORD1XW.DOC with improbably high free block index
 >>>>>>>>>0	ulelong		<0x400000
 # skip WinStore.App.exe by looking for printable 2nd character of 1st memo item
 >>>>>>>>>>513	ubyte		>037
 # skip DOS executables CPQ0TD.DRV E30ODI.COM IBM0MONO.DRV by looking for printable 1st character of 1st memo item
 >>>>>>>>>>>512	ubyte		>037
-# unusual dBASE III DBT like adressen.dbt
->>>>>>>>>>>>0	use		dbase3-memo-print
+# skip few (14/758) Microsoft Event Trace Logs (boot_BASE+CSWITCH_1.etl DlTel-Merge.etl UpdateUx.006.etl) with invalid "high" 1st item \377\377
+>>>>>>>>>>>>512	ubyte		<0377
+# unusual dBASE III DBT like adressen.dbt biblio.dbt fsadress.dbt
+>>>>>>>>>>>>>0	use		dbase3-memo-print
 # dBASE III DBT like angest.dbt, or garbage PCX DBF
 >>>>>>>>8	ubelong		!0
 # skip PCX and some DBF by test for for reserved NULL bytes
 >>>>>>>>>510	ubeshort	0
 # skip bad symples with improbably high free block index above 2 GiB file limit
 >>>>>>>>>>0	ulelong		<0x400000
 # skip AI070GEP.EPS by printable 1st character of 1st memo item
 >>>>>>>>>>>512	ubyte		>037
 # skip some Microsoft Visual C, OMF library like: BZ2.LIB WATTCPWL.LIB ZLIB.LIB
 >>>>>>>>>>>>512	ubyte		<0200
 # skip gluon-ffhat-1.0-tp-link-tl-wr1043n-nd-v2-sysupgrade.bin by printable 2nd character
 >>>>>>>>>>>>>513 ubyte		>037
->>>>>>>>>>>>>>0	use		dbase3-memo-print
+# skip few (8/758) Microsoft Event Trace Logs (WBEngine.3.etl Wifi.etl) with valid 1st item like
+# "9600.20369.amd64fre.winblue_ltsb_escrow.220427-1727"
+# "9600.19846.amd64fre.winblue_ltsb_escrow.200923-1735"
+# "10586.494.amd64fre.th2_release_sec.160630-1736"
+# by looking for valid terminating character Ctrl-Z
+>>>>>>>>>>>>>>513 search/0x11E	\032
+# followed by second character Ctrl-Z implies typical DBT
+>>>>>>>>>>>>>>>&0	ubyte	032
+# examples like: angest.dbt
+>>>>>>>>>>>>>>>>0	use	dbase3-memo-print
+>>>>>>>>>>>>>>>&0	ubyte	0
+# no example found here with terminating sequence CTRL-Z + \0
+>>>>>>>>>>>>>>>>0	use	dbase3-memo-print
 # dBASE IV DBT with positive block size
 >>>>>>>20	uleshort	>0
 # dBASE IV DBT with valid block length like 512, 1024
 # multiple of 2 in between 16 and 16 K ,implies upper and lower bits are zero
 # skip also 3600h 3E00h size
 >>>>>>>>20	uleshort&0xE00f	0
@@ -448,12 +462,17 @@
 >20	uleshort		!0		\b, block length %u
 # dBase III memo field terminated by \032\032
 # like: "WHAT IS XBASE" test.dbt "Borges, Malte" biblio.dbt "First memo\032\032" T2.DBT
 >512	string			>\0		\b, 1st item "%s"
 # For DEBUGGING
 #>512	ubelong			x		\b, 1ST item %#8.8x
+#>513	search/0x225		\032		FOUND_TERMINATOR
+#>>&0	ubyte			032		2xCTRL_Z
+# fsadress.dbt has 1 Ctrl-Z terminator followed by nil byte
+#>>&0	ubyte			0		1xCTRL_Z
+
 # https://www.clicketyclick.dk/databases/xbase/format/dbt.html
 #		Print the information of dBase IV DBT memo file
 0	name				dbase4-memo-print
 >0		lelong		x		dBase IV DBT
 !:mime	application/x-dbt
 !:ext dbt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.43-database-etl.diff.sig
Type: application/octet-stream
Size: 1597 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220922/85dcd7ed/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-etl_dbt.txt.gz
Type: application/x-gzip
Size: 1395 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220922/85dcd7ed/attachment.bin>


More information about the File mailing list