[File] [PATCH] Magdir/firmware Intel HEXadecimal not recognized

Jörg Jenderek (GMX) joerg.jen.der.ek at gmx.net
Thu Mar 7 00:56:18 UTC 2024


Hello,

some days ago i started to build my own GRUB switch. The project page
URL is:
https://github.com/rw-hsma-fpga/grub-switch

In last step the firmware is written to micro controller with help of
avrdude tool. The firmware files have file name suffix HEX.

So i look for such files. When running file command version 5.45 on such
samples i get an output like:

ATmegaBOOT_168.hex:    ASCII text
BCM2033-MD.hex:        ASCII text, with CRLF line terminators
GenuinoAtmega16u2.hex: ASCII text
as102_data2_st.hex:    ASCII text
barcode.hex:           ASCII text
main.hex:              ASCII text, with CRLF line terminators
ulink_firmware.hex:    ASCII text
usbdldv2.hex:          ASCII text, with CRLF line terminators

With option --extension only 3 byte sequence ??? is shown and with -i
option only generic text/plain is shown.

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). All my inspected samples
are described as "Intel Hexadecimal object format" by
hex-intel.trid.xml. Here mime type text/x-hex is shown. For file name
suffix only one (.HEX) shown (See appended trid-v-hex.txt.gz).

For comparison reason i also run the file format identification utility
DROID (See https://sourceforge.net/projects/droid/). Here the samples
are not recognized.

On Linux (Raspian 11) such samples are called "Intel® hexadecimal object
file". Here text/x-hex is used as mime type. Here suffix HEX is
displayed. That information can be not seen in freedesktop shared
MIME-info database. By Notepad++ Editor this format is called "Intel HEX
binary data".

With the help of other tools i found a page about the HEX format on
Wikipedia. That informations are expressed by comment lines at the end
of Magdir/firmware like:
# URL:		https://en.wikipedia.org/wiki/Intel_HEX
# Reference:	http://www.piclist.com/techref/fileext/hex/intel.htm
#		http://mark0.net/download/triddefs_xml.7z
#		defs/h/hex-intel.trid.xml

Unfortunately there exist no unique and long magic pattern for Intel hex
samples. So i do displaying in sub routine intel-hex that starts like:
  0	name		intel-hex
  >0	ubyte		x		Intel hexadecimal object
  !:mime	text/x-hex
  !:ext	hex
Instead of generic mime type i show user defined type shown by TrID and
on Linux. On Wikipedia a dozen of suffix are mentioned, but in my
inspected samples only HEX was used as file name suffix. So i only show
that file name suffix.

After the start code marker colon comes 2 hex digits for number of bytes
(RECLEN record length) in first data field. The maximum value is 255,
but in real example i found only low multiple of 2 ( like 2 4 8 10h 20h)
and 03 (usbdload.hex usbdldv2.hex from Windows Vista). Afterwards comes
4 hex digits for 16-bit memory offset of first data. Often i found value
0000 (that is natural order), but i also found samples with higher
offsets (like 1C00h 1E00h 3800h 3E00h 76EDh 7800h 7E00h). This is
followed by 2 hex digits (range 00 - 05) for RECTYP (record type
hexadecimal 3030-3035). Afterwards comes n bytes of first data
represented by 2n hex digits followed by 1 byte checksum (where
n=RECLEN). So show this information for first record by lines like:
  >1	string		x		\b, 0x%2.2s record length
  >3	string		x		\b, 0x%4.4s offset
  >7	string		x		\b, '%2.2s' type
  >9	string		x		\b, data+checksum %s
For control reason i check the last line of Hex samples. Then typically
you get a string like :00000001FF. The offset depends if CRLF or LF is
used as line terminator. That is last record with RECLEN 0, OFFSET 0,
record type 1 for EndOfFile and 1 checksum byte FFh. That string is used
by TrID tool as additional test criterium. That information  is here
shown by last lines like:
  >-2	ubeshort	=0x0d0a
  >>-13	string		!:00000001FF	\b, last line %s
  >-2	ubeshort	!0x0d0a
  >>-1	ubyte		=0x0a
  >>>-12	string		!:00000001FF	\b, last line %s

So the start of description now starts with lines like:
  0	ubyte		0x3A
  >&6	ubeshort&0xFFf8	=0x3030
  >>&-8	ubeshort&0xFCf0	=0x3030
  >>>0	use		intel-hex
By first test i look for start code. That is 1 character, an ASCII colon
':'. In my samples the colon was found at offset 0, but according to
documentation there may exist samples with garbage before. All
characters preceding this symbol should be ignored. So i use relative
offset in next test. If just cases are found the test lines must be
adopted for such cases. The second test checks for valid record type
string with string range 00 - 05 (that is hexadecimal 3030 - 3035). The
third test check for valid record length string (like: 02 04 08 10 20
03). Now i have more then 32 bit tested. So the whole test is  hopefully
unique enough and i now can call sub routine.

After applying the above mentioned modifications by patch
file-5.45-firmware-hex.diff then all my inspected Intel HEX samples are
now recognized. This now looks like:

ATmegaBOOT_168.hex:    Intel hexadecimal object, 0x10 record length,
		       0x3800 offset, '00' type, data+checksum
		       0C94341C0C944F1C0C944F1C0C944F1CA7
BCM2033-MD.hex:        Intel hexadecimal object, 0x10 record length,
		       0x76ED offset, '00' type, data+checksum
		       8F14D3E514648094894008E5142437FF82
GenuinoAtmega16u2.hex: Intel hexadecimal object, 0x20 record length,
		       0x0000 offset, '00' type, data+checksum
		       9EC00000B7C00000B5C00000B3C00000B
		       1C00000AFC00000ADC00000ABC000006B
as102_data2_st.hex:    Intel hexadecimal object, 0x02 record length,
		       0x0000 offset, '04' type, data+checksum
		       0000FA
barcode.hex:           Intel hexadecimal object, 0x08 record length,
		       0x0000 offset, '00' type, data+checksum
		       831610308500831205
main.hex:              Intel hexadecimal object, 0x02 record length,
		       0x0000 offset, '02' type, data+checksum
		       1000EC
ulink_firmware.hex:    Intel hexadecimal object, 0x04 record length,
		       0x0000 offset, '00' type, data+checksum
		       0200713257
usbdldv2.hex:          Intel hexadecimal object, 0x03 record length,
		       0x0000 offset, '00' type, data+checksum
		       120003E8

I hope my diff file can be applied in future version of file
utility. Unfortunately the HEX suffix is also used for other file
formats like GNU Unifont hex format. I will try to handle such fonts in
a future session.

With best wishes,
Jörg Jenderek
--
Jörg Jenderek
-------------- next part --------------
--- file-5.45/magic/Magdir/firmware.old	2023-04-02 17:19:13.000000000 +0200
+++ file-5.45/magic/Magdir/firmware	2024-03-07 01:43:08.217585400 +0100
@@ -131,3 +131,48 @@
 >>>112	string/16	x		%s
 >>144	string/32	x		\b, IDF version: %s
 >>4	ulelong		x		\b, entry address: 0x%08X
+
+# Summary:	Intel HEXadecimal file format
+# URL:		https://en.wikipedia.org/wiki/Intel_HEX
+# Reference:	http://www.piclist.com/techref/fileext/hex/intel.htm
+#		http://mark0.net/download/triddefs_xml.7z/defs/h/hex-intel.trid.xml
+# From:		Joerg Jenderek
+# Note:		called "Intel Hexadecimal object format" by TrID, "Intel® hexadecimal object file" on Linux
+#		and "Intel HEX binary data" by Notepad++
+# look for start code; 1 character, an ASCII colon ':'; all characters preceding this symbol should be ignored
+0	ubyte		0x3A
+# check for valid record type string with range 00 - 05 (3030h - 3035h)
+>&6	ubeshort&0xFFf8	=0x3030
+# check for valid record length string like: 02 04 08 10h 20h 03 (usbdload.hex usbdldv2.hex from Windows Vista)
+#>>1	string		x		LENGTH_STRING=%0.2s
+#>>1	ubeshort	x		LENGTH=%#4.4x
+>>&-8	ubeshort&0xFCf0	=0x3030
+>>>0	use		intel-hex
+#	display information (offset, record length and type) of Intel HEX
+0	name		intel-hex
+# RECORD MARK
+>0	ubyte		x		Intel hexadecimal object
+#!:mime	text/plain
+!:mime	text/x-hex
+!:ext	hex
+# no samples with other suffix found
+# .hex .mcs .int .ihex .ihe .ihx .h80 .h86 .a43 .a90 .obj .obl .obh .rom .eep
+# .hxl-.hxh .h00-.h15 .p00-.pff
+# RECLEN; 2 hex digits for number of bytes in 1st data field; like 0x02 0x03 0x04 0x08 0x10 0x20; maximum 255
+>1	string		x		\b, 0x%2.2s record length
+# OFFSET; 4 hex digits for 1st 16-bit memory offset of data like: 0000 (often) 1C00h 1E00h 3800h 3E00h 76EDh 7800h 7E00h ...
+>3	string		x		\b, 0x%4.4s offset
+# RECTYP; 2 hex digits (00 - 05); meaning of 1st data field; 00~DataRecord (often) 0l~EndOfFileRecord 02~ExtendedSegmentAddressRecord 03~StartSegmentAddressRecord 04~ExtendedLinearAddressRecord 05~StartLinearAddressRecord
+>7	string		x		\b, '%2.2s' type
+# DATA; n bytes of 1st data represented by 2n hex digits followed by 1 byte checksum
+>9	string		x		\b, data+checksum %s
+# last record :00000001FF with RECLEN 0, OFFSET 0, record type 01 for EndOfFile and 1 checksum byte FF
+# samples with CarriageReturnLineFeed terminator
+>-2	ubeshort	=0x0d0a
+# This should not happen!
+>>-13	string		!:00000001FF	\b, last line %s
+>-2	ubeshort	!0x0d0a
+# samples with LineFeed terminator
+>>-1	ubyte		=0x0a
+# This should not happen!
+>>>-12	string		!:00000001FF	\b, last line %s
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.45-firmware-hex.diff.sig
Type: application/octet-stream
Size: 1493 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20240307/ffeb9674/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-hex.txt.gz
Type: application/x-gzip
Size: 441 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20240307/ffeb9674/attachment.bin>


More information about the File mailing list