[File] [PATCH] Magdir/geo GeoSwath RDF misidentifies many Microsoft Event Trace Logs *.ETL

Jörg Jenderek joerg.jen.der.ek at gmx.net
Wed Sep 21 01:16:44 UTC 2022


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

Some days ago i run the cleaning tool czkawka found on
https://qarmin.github.io/czkawka/. One menu item concerns bad
extensions. After running tool i looked in saved file list
results_bad_extensions.txt for bad extension examples.
One listed extension is ETL.

These file are Microsoft Event Trace Logs. When running file
command version 5.43 on such ETL examples and related positive
samples i get an output like:

060116342.rdf:                GeoSwath RDF
AMSITrace.etl:                GeoSwath RDF
NotificationUxBroker.052.etl: GeoSwath RDF
WindowsBackup.4.etl:          GeoSwath RDF
lxcore_kernel.etl:            GeoSwath RDF

Furthermore for such RDF samples only generic
application/octet-stream mime type is shown with -i option. With
option --extension 3 byte sequence ??? is shown.

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). This does not
recognise RDF sample.

For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/). This
misidentifies RDF sample based on file name suffix as RDF/XML by
PUID fmt/875.

Luckily in Magdir/geo an URL was mentioned. But this redirect on
new site. There i found RDF sample 060116342.rdf and file format
specification as PDF. So that informations are now expressed by
comment lines inside Magdir/geo like:
# URL:		https://www.mbari.org/
#		products/research-software/mb-system/
# Reference:	http://ccom.unh.edu/sites/default/files/
#		news-and-events/conferences/auv-bootcamp/
#		GS%2B-6063-BB-GS%2B-Broadcast-Raw-Data-
#		File-Format-Command-Specification.pdf

In current Magdir/geo the detection happens by line like:
 4	beshort	0x2002	GeoSwath RDF
Apparently this 2 byte value is too weak for detection. So
additional test lines must be added before displaying file type
message.
According to specification all data is written using Intel 80x86
byte ordering (LSB to MSB) and this value is file header size
(raw_header_size in bytes) with value 544. So with this knowledge
we can inspect next section. This is the first ping section which
starts with the ping number. That information is now shown by line
like:
 >>544	lelong	x	\b, 1st ping number %d
For real RDF example i get something like 4944, whereas for ETL
examples i get often negative values like -1 or -1072627710.

Before that eight spare bytes stored. That information is now shown
by line like:
 >>536	ubequad	!0	\b, spare %#16.16llx
For real RDF example i got here zero value, whereas for ETL sample
i got here other values like (0x650074006c000000 0x6c00000000000000).
It is not explicitly written, but i assume that this probably
always true. So this could be used as additional test by line like:
 >>536	ulequad	=0	OK_THIS_IS_GeoSwath_RDF

The size of the ping header in bytes is stored in header. That
information is shown by line like:
 >>6	leshort	!64	\b, ping header size %d
For real RDF is got here value 64 whereas for all concerned ETL
samples (63 of 753 like AMSITrace.etl lxcore_kernel.etl
NotificationUxBroker.052.etl WindowsBackup.4.etl) i got value 0.
Using brain we can assume that here only "low positive" values can
occur. So i use this as second test which now starts like:
 4	beshort	0x2002
 >6	leshort	>0	GeoSwath RDF
 !:mime	application/x-geoswath-rdf
 !:ext	rdf
Instead of generic application/octet-stream i choose an user
defined mime type.

The header contain some ASCII strings. These information are shown
by lines like:
 >>8	string	x	"%-.512s"
 >>527	string	x	\b, version %-.8s
The first is the original file name. In RDF example it was
"C:\GS+\Projects\Default\Raw Data Files\060116342.rdf". The other
string is the recording software version number. In RDF example it
is 3.16c. For concerned ETL samples it got here garbage.

The file creation time is stored at unsigned int at the beginning.
Unfortunately is is not written what exact time stamp format this is.
I was not able to determine the exact file format. So who this
information by line like:
 >>0	ulelong	x	\b, creation time %#8.8x

After applying the above mentioned modifications by patch
file-5.43-geo-rdf.diff then all concerned ETL samples are not
misidentified as RDF any more and for real RDF sample more
information is shown. This now looks like:

060116342.rdf:                GeoSwath RDF
			      "C:\GS+\Projects\Default\
			      Raw Data Files\060116342.rdf"
			      , version 3.16c
			      , creation time 0x4a24030e
			      , frequency 500000
			      , echo type 0x1
			      , pps mode 0x2
			      , 1st ping number 4944
AMSITrace.etl:                data
NotificationUxBroker.052.etl: data
WindowsBackup.4.etl:          data
lxcore_kernel.etl:            data

I hope my diff file can be applied in future version of file
utility.

Unfortunately more ETL samples are misidentified and description of
ETL itself is missing. I will try to do this in a future session.

With best wishes,
Jörg Jenderek
- --
Jörg Jenderek
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYypl/AAKCRCv8rHJQhrU
1pYmAJwMIzVLifth1ktqIitQ/Chg1BcCUwCfawBK9wwaRINX2XUoFOtaDhN7rf8=
=EZuM
-----END PGP SIGNATURE-----
-------------- next part --------------
-- 
File mailing list
File at astron.com
https://mailman.astron.com/mailman/listinfo/file

-------------- next part --------------
--- file-5.43/magic/Magdir/geo.old	2022-03-26 19:58:39.000000000 +0100
+++ file-5.43/magic/Magdir/geo	2022-09-21 03:06:44.087263400 +0200
@@ -52,11 +52,47 @@
 # MULTIBEAM SONARS https://www.ldeo.columbia.edu/res/pi/MB-System/formatdoc/
 #
 ######################################################################
 
 # GeoAcoustics - GeoSwath Plus
-4	beshort	0x2002	GeoSwath RDF
+# Update:	Joerg Jenderek
+# URL:		https://www.mbari.org/products/research-software/mb-system/
+# Reference:	http://ccom.unh.edu/sites/default/files/news-and-events/conferences/auv-bootcamp/
+#		GS%2B-6063-BB-GS%2B-Broadcast-Raw-Data-File-Format-Command-Specification.pdf
+# Note:		All data is written using Intel 80x86 byte ordering (LSB to MSB)
+# raw_header_siz; file header size is 544 bytes
+4	beshort	0x2002
+# GRR: line above is too general as it matches also some Microsoft Event Trace Logs *.ETL
+# skip many (63/753) Microsoft Event Trace Logs (AMSITrace.etl lxcore_kernel.etl NotificationUxBroker.052.etl WindowsBackup.4.etl) with invalid "low" ping header size 0
+>6	leshort	>0	GeoSwath RDF
+# skip foo samples with invalid "high" spare bytes
+#>>536	ulequad	=0	OK_THIS_IS_GeoSwath_RDF
+#!:mime	application/octet-stream
+!:mime	application/x-geoswath-rdf
+# http://ccom.unh.edu/sites/default/files/news-and-events/conferences/auv-bootcamp/060116342.rdf
+!:ext	rdf
+# filename; original file name like: "C:\GS+\Projects\Default\Raw Data Files\060116342.rdf"
+>>8	string	x	"%-.512s"
+# version[8]; recording software version number like: 3.16c
+>>527	string	x	\b, version %-.8s
+# creation; unsigned int file creation time; WHAT time format is this? 
+>>0	ulelong	x	\b, creation time %#8.8x
+# raw_ping_header_size; size of ping header in bytes like: 64
+>>6	leshort	!64	\b, ping header size %d
+# frequency; system frequency in hertz like: 500000
+>>520	lelong	x	\b, frequency %d
+# echo_type; Echosounder type index like: 1
+>>524	leshort	x	\b, echo type %#x
+# file_mode; file mode mask (0x00 bathy & sidescan, 0x80 bathy, 0x40 sidescan, 0x20 seismic)
+>>526	ubyte	!0	\b, file mode %#2.2x
+# pps_mode; PPS synch mode like: 2
+>>535	byte	x	\b, pps mode %#x
+# char spare[8]; apparently zeroed
+>>536	ubequad	!0	\b, spare %#16.16llx
+# Ping_number; 1st ping number like: 4944
+>>544	lelong	x	\b, 1st ping number %d
+
 0	string	Start:-	GeoSwatch auf text file
 
 # Seabeam 2100
 # mbsystem code mb41
 0	string SB2100	SeaBeam 2100 multibeam sonar
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.43-geo-rdf.diff.sig
Type: application/octet-stream
Size: 1442 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220921/4861fe15/attachment.obj>


More information about the File mailing list