[File] [PATCH] Magdir/geo GeoSwath RDF misidentifies many Microsoft Event Trace Logs *.ETL

Christos Zoulas christos at zoulas.com
Fri Sep 23 13:25:27 UTC 2022


Committed, thanks!

christos

> On Sep 20, 2022, at 9:16 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hello,
> 
> Some days ago i run the cleaning tool czkawka found on
> https://qarmin.github.io/czkawka/. One menu item concerns bad
> extensions. After running tool i looked in saved file list
> results_bad_extensions.txt for bad extension examples.
> One listed extension is ETL.
> 
> These file are Microsoft Event Trace Logs. When running file
> command version 5.43 on such ETL examples and related positive
> samples i get an output like:
> 
> 060116342.rdf:                GeoSwath RDF
> AMSITrace.etl:                GeoSwath RDF
> NotificationUxBroker.052.etl: GeoSwath RDF
> WindowsBackup.4.etl:          GeoSwath RDF
> lxcore_kernel.etl:            GeoSwath RDF
> 
> Furthermore for such RDF samples only generic
> application/octet-stream mime type is shown with -i option. With
> option --extension 3 byte sequence ??? is shown.
> 
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). This does not
> recognise RDF sample.
> 
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/). This
> misidentifies RDF sample based on file name suffix as RDF/XML by
> PUID fmt/875.
> 
> Luckily in Magdir/geo an URL was mentioned. But this redirect on
> new site. There i found RDF sample 060116342.rdf and file format
> specification as PDF. So that informations are now expressed by
> comment lines inside Magdir/geo like:
> # URL:		https://www.mbari.org/
> #		products/research-software/mb-system/
> # Reference:	http://ccom.unh.edu/sites/default/files/
> #		news-and-events/conferences/auv-bootcamp/
> #		GS%2B-6063-BB-GS%2B-Broadcast-Raw-Data-
> #		File-Format-Command-Specification.pdf
> 
> In current Magdir/geo the detection happens by line like:
> 4	beshort	0x2002	GeoSwath RDF
> Apparently this 2 byte value is too weak for detection. So
> additional test lines must be added before displaying file type
> message.
> According to specification all data is written using Intel 80x86
> byte ordering (LSB to MSB) and this value is file header size
> (raw_header_size in bytes) with value 544. So with this knowledge
> we can inspect next section. This is the first ping section which
> starts with the ping number. That information is now shown by line
> like:
>>> 544	lelong	x	\b, 1st ping number %d
> For real RDF example i get something like 4944, whereas for ETL
> examples i get often negative values like -1 or -1072627710.
> 
> Before that eight spare bytes stored. That information is now shown
> by line like:
>>> 536	ubequad	!0	\b, spare %#16.16llx
> For real RDF example i got here zero value, whereas for ETL sample
> i got here other values like (0x650074006c000000 0x6c00000000000000).
> It is not explicitly written, but i assume that this probably
> always true. So this could be used as additional test by line like:
>>> 536	ulequad	=0	OK_THIS_IS_GeoSwath_RDF
> 
> The size of the ping header in bytes is stored in header. That
> information is shown by line like:
>>> 6	leshort	!64	\b, ping header size %d
> For real RDF is got here value 64 whereas for all concerned ETL
> samples (63 of 753 like AMSITrace.etl lxcore_kernel.etl
> NotificationUxBroker.052.etl WindowsBackup.4.etl) i got value 0.
> Using brain we can assume that here only "low positive" values can
> occur. So i use this as second test which now starts like:
> 4	beshort	0x2002
>> 6	leshort	>0	GeoSwath RDF
> !:mime	application/x-geoswath-rdf
> !:ext	rdf
> Instead of generic application/octet-stream i choose an user
> defined mime type.
> 
> The header contain some ASCII strings. These information are shown
> by lines like:
>>> 8	string	x	"%-.512s"
>>> 527	string	x	\b, version %-.8s
> The first is the original file name. In RDF example it was
> "C:\GS+\Projects\Default\Raw Data Files\060116342.rdf". The other
> string is the recording software version number. In RDF example it
> is 3.16c. For concerned ETL samples it got here garbage.
> 
> The file creation time is stored at unsigned int at the beginning.
> Unfortunately is is not written what exact time stamp format this is.
> I was not able to determine the exact file format. So who this
> information by line like:
>>> 0	ulelong	x	\b, creation time %#8.8x
> 
> After applying the above mentioned modifications by patch
> file-5.43-geo-rdf.diff then all concerned ETL samples are not
> misidentified as RDF any more and for real RDF sample more
> information is shown. This now looks like:
> 
> 060116342.rdf:                GeoSwath RDF
> 			      "C:\GS+\Projects\Default\
> 			      Raw Data Files\060116342.rdf"
> 			      , version 3.16c
> 			      , creation time 0x4a24030e
> 			      , frequency 500000
> 			      , echo type 0x1
> 			      , pps mode 0x2
> 			      , 1st ping number 4944
> AMSITrace.etl:                data
> NotificationUxBroker.052.etl: data
> WindowsBackup.4.etl:          data
> lxcore_kernel.etl:            data
> 
> I hope my diff file can be applied in future version of file
> utility.
> 
> Unfortunately more ETL samples are misidentified and description of
> ETL itself is missing. I will try to do this in a future session.
> 
> With best wishes,
> Jörg Jenderek
> - --
> Jörg Jenderek
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
> 
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYypl/AAKCRCv8rHJQhrU
> 1pYmAJwMIzVLifth1ktqIitQ/Chg1BcCUwCfawBK9wwaRINX2XUoFOtaDhN7rf8=
> =EZuM
> -----END PGP SIGNATURE-----
> <Nachrichtenteil als Anhang.DEFANGED-2184><file-5_43-geo-rdf_diff.DEFANGED-2185><file-5_43-geo-rdf_diff_sig.DEFANGED-2186>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20220923/aa18ea8d/attachment-0001.asc>


More information about the File mailing list