[File] [PATCH] Magdir/database dBase III DBT misidentifies some Microsoft Event Trace Logs *.ETL
    Christos Zoulas 
    christos at zoulas.com
       
    Fri Sep 23 19:55:36 UTC 2022
    
    
  
Committed, thanks!
christos
> On Sep 22, 2022, at 11:58 AM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hello,
> 
> Some days ago i run the cleaning tool czkawka found on
> https://qarmin.github.io/czkawka/. One menu item concerns bad
> extensions. After running tool i looked in saved file list
> results_bad_extensions.txt for bad extension examples.
> One listed extension is ETL.
> 
> These files are Microsoft Event Trace Logs. When running file
> command version 5.43 on some other ETL examples and related positive
> DBT samples i get an output like:
> 
> DlTel-Merge.etl:  dBase III DBT, version number 0,
> 		  next free block index 65536, 1st item
> 		  "\377\377\377\377\377\377\377\377\377
> UpdateUx.006.etl: dBase III DBT, version number 0,
> 		  next free block index 4096, 1st item
> 		  "\377\377\377\377\377\377\377\377\377
> WBEngine.3.etl:   dBase III DBT, version number 0,
> 		  next free block index 10240, 1st item
> 		  "9600.18730.amd64fre.winblue_ltsb.170613-0600"
> Wifi.etl:         dBase III DBT, version number 0,
> 		  next free block index 81920, 1st item
> 		  "10586.494.amd64fre.th2_release_sec.160630-1736"
> adressen.dbt:     dBase III DBT, version number 0,
> 		  next free block index 3, 1st item
> 		  "2NDemail:user12 at localhost.local\032\032"
> angest.dbt:       dBase III DBT, version number 0,
> 		  next free block index 9, 1st item
> 		  "Sport: Bogenschie\341en"
> biblio.dbt:       dBase III DBT, version number 0,
> 		  next free block index 1194, 1st item
> 		  "Borges, Malte; Schumacher, J\303\266rg;
> 		  Redeker, Torsten\032\032"
> dbase3dbt0_1.dbt: dBase III DBT, version number 0,
> 		  next free block index 2, 1st item
> 		  "1st memo. test umlaut with cp 1252: \
> 		  344=ae, \366=oe, \374=ue, \337=ss,\200=euro,
> 		  \304=Ae, \326=Oe, \334=Ue\032\032"
> fsadress.dbt:     dBase III DBT, version number 0,
> 		  next free block index 5, 1st item
> 		  "This is a note for Karl M\374ller. "
> 
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). This does not
> recognise DBT samples, but i recognizes many ETL samples ( See
> appended trid-v-etl_dbt.txt.gz).
> 
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/). This
> identifies all DBT samples as "dBASE Text Memo" based on file name
> suffix by PUID x-fmt/311.
> 
> Luckily in Magdir/database the displaying part is done by sub routine
> dbase3-memo-print. So only additional test must be done before
> calling that routine.
> 
> One DBT branch is for samples (like adressen.dbt biblio.dbt
> fsadress.dbt). This looks at the moment like:
>>>>>>>>>>> 513	ubyte		>037
>>>>>>>>>>>> 512	ubyte		>037
>>>>>>>>>>>>> 0	use		dbase3-memo-print
> For real DBT samples the first item must be something like printable
> ASCII string. By last test line some DOS executables ( like
> CPQ0TD.DRV E30ODI.COM IBM0MONO.DRV) are skipped by looking for
> printable first character of 1st item is not "too low". By test
> before bad sample (like WinStore.App.exe) is skipped by looking for
> printable second character of first item is not "too low".
> For few (14/758) Microsoft Event Trace Logs (like
> boot_BASE+CSWITCH_1.etl DlTel-Merge.etl UpdateUx.006.etl) this field
> interpreted gives invalid "high" 1st item starting with byte sequence
> \377\377. So bad ETL samples are now skipped by additional test for
> "low enough" first item name. This part now becomes like:
>>>>>>>>>>> 513		ubyte		>037
>>>>>>>>>>>> 512		ubyte		>037
>>>>>>>>>>>>> 512	ubyte		<0377
>>>>>>>>>>>>>> 0	use		dbase3-memo-print
> 
> There is another branch handling samples like angest.dbt. This looks
> like:
>>>>>>>>>>>> 512		ubyte		>037
>>>>>>>>>>>>> 512	ubyte		<0200
>>>>>>>>>>>>>> 513 	ubyte		>037
>>>>>>>>>>>>>>> 0	use		dbase3-memo-print
> There a test for "too high first character was already used. By
> this step some Microsoft Visual C, OMF libraries (like: BZ2.LIB
> WATTCPWL.LIB ZLIB.LIB) are skipped. Unfortunately in this brunch
> few (8/758) Microsoft Event Trace Logs (like WBEngine.3.etl
> Wifi.etl) are still misidentified. And when we look at first item
> name these are strange but valid. These names look like:
> 	"9600.20369.amd64fre.winblue_ltsb_escrow.220427-1727"
> 	"9600.19846.amd64fre.winblue_ltsb_escrow.200923-1735"
> 	"10586.494.amd64fre.th2_release_sec.160630-1736"
> 
> So i must look for another test. When looking in DBT examples we see
> in many examples at the end a byte sequence is displayed like
> \032\032. So for debugging reason add at the end of subroutine some
> lines like:
>> 513	search/0x225		\032		FOUND_TERMINATOR
>>> &0	ubyte			032		2xCTRL_Z
>>> &0	ubyte			0		1xCTRL_Z
> Then we see what is written in the documentation. The item field
> normally is terminated by 2 Control-Z characters. But in some
> variants (FoxPro, Fox?? like fsadress.dbt) only one Control-Z
> character is used. In my inspected example next character was a nil
> byte. Nothing is written about the size of memo field. But using
> brain we can assume that this only some hundred characters. No human
> will write a comment or note with thousand of characters. So use this
> facts in concerned branch. This now becomes like:
>>>>>>>>>>>>>> 513 ubyte		>037
>>>>>>>>>>>>>>> 513 search/0x11E	\032
>>>>>>>>>>>>>>>> &0	ubyte	032
>>>>>>>>>>>>>>>>> 0	use	dbase3-memo-print
>>>>>>>>>>>>>>>> &0	ubyte	0
>>>>>>>>>>>>>>>>> 0	use	dbase3-memo-print
> 
> After applying the above mentioned modifications by patch
> file-5.43-database-etl.diff then all concerned ETL samples are not
> misidentified as DBT any more and real DBT samples are still
> recognized. This now looks like:
> DlTel-Merge.etl:  data
> UpdateUx.006.etl: data
> WBEngine.3.etl:   data
> Wifi.etl:         data
> adressen.dbt:     dBase III DBT, version number 0,
> 		  next free block index 3, 1st item
> 		  "2NDemail:user12 at localhost.local\032\032"
> angest.dbt:       dBase III DBT, version number 0,
> 		  next free block index 9, 1st item
> 		  "Sport: Bogenschie\341en"
> biblio.dbt:       dBase III DBT, version number 0,
> 		  next free block index 1194, 1st item
> 		  "Borges, Malte; Schumacher, J\303\266rg;
> 		  Redeker, Torsten\032\032"
> dbase3dbt0_1.dbt: dBase III DBT, version number 0,
> 		  next free block index 2, 1st item
> 		  "1st memo. test umlaut with cp 1252:
> 		  \344=ae, \366=oe, \374=ue, \337=ss,\200=euro,
> 		  \304=Ae, \326=Oe, \334=Ue\032\032"
> fsadress.dbt:     dBase III DBT, version number 0,
> 		  next free block index 5, 1st item
> 		  "This is a note for Karl M\374ller. "
> 
> I hope my diff file can be applied in future version of file
> utility.
> 
> Unfortunately more ETL samples are misidentified and description of
> ETL itself is missing. I will try to do this in a future session.
> 
> With best wishes,
> Jörg Jenderek
> - --
> Jörg Jenderek
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
> 
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYyyGGQAKCRCv8rHJQhrU
> 1mdiAJ9xkBS5Lj8E8hgFDFouX+zkr/4E5gCZATGzFeku3ON6qA+AfoOpXTsthOo=
> =nsOc
> -----END PGP SIGNATURE-----
> <file-5_43-database-etl_diff.DEFANGED-9325><file-5_43-database-etl_diff_sig.DEFANGED-9326><trid-v-etl_dbt.txt.gz>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20220923/25231637/attachment.asc>
    
    
More information about the File
mailing list