[File] [PATCH] of Magdir/{archive, hp} for current ar archive (update +extension *.a *.ar *.lib)

Jörg Jenderek joerg.jen.der.ek at gmx.net
Sat Jan 12 22:31:37 UTC 2019


Hello,

some weeks ago i compiled a software for Arduino. Just for fun i inspect
generated libraries by file command. I got strange output. So i dig
deeper. I look on my systems by find utility and on the net for hundreds
similar libraries and archives. I also generate test files to verify
magic lines. When i run file command version 5.35 with -k option i get
output like:

arduino.ar:
	current ar archive\012-
	archive file
libcurl-MAC86.a:
	current ar archive random library\012-
	archive file
libcurl-mac.a:
	Mach-O universal binary with 2 architectures:
	[i386:current ar archive random library\012- archive file]
	[x86_64:current ar archive random library\012- archive file]
lldMinGW.lib:
	current ar archive\012-
	archive file
libdelayimp_EMPTY.a:
	current ar archive\012-
	archive file
libdbug.a:
	current ar archive\012-
	archive file - PA-RISC2.0 relocatable library
libz.a:
	current ar archive\012-
	archive file - PA-RISC1.1 relocatable library
test020b0619.ar:
	current ar archive\012-
	archive file - PA-RISC1.0 relocatable library
test02110619.ar:
	current ar archive\012-
	archive file - PA-RISC1.2 relocatable library
testMipsEBUB_.ar:
	MIPS archive with MIPS Ucode members
	with MIPSEB members and an EB hash table\012-
	current ar archive\012-
	archive file
testMipsELULX.ar:
	MIPS archive with MIPS Ucode members
	with MIPSEL members and an EL hash table -- out of date\012-
	current ar archive\012-
	archive file
webmin_1.870_all.deb:
	Debian binary package (format 2.0)\012-
	current ar archive\012-
	archive file

On the one hand all samples are described as "current ar archive" by
magic line in Magdir/archive
 0	string		=!<arch>\n	current ar archive
On the other hand all samples are also described as "archive file" by
magic line in Magdir/hp
 0	belong		0x213c6172	archive file
When we look at the ASCII values of 4 byte string "!<ar" (that is
213c6172h) it is visible that the two lines look for the same
characteristic. So i removed duplicating line inside Magdir/hp.

Furthermore 4 additional lines inside Magdir/hp must be moved to right
place in Magdir/archive
 >68	belong 		0x020b0619	- PA-RISC1.0 relocatable library
 >68	belong	 	0x02100619	- PA-RISC1.1 relocatable library
 >68	belong 		0x02110619	- PA-RISC1.2 relocatable library
 >68	belong 		0x02140619	- PA-RISC2.0 relocatable library
To understand what does these lines mean, look at the document with
title "The 32-bit PA-RISC Run-time Architecture Document" found for
example at https://parisc.wiki.kernel.org/images-parisc/b/b2/ as
Rad_11_0_32.pdf. So i add that reference before these 4 lines.
According to that document Library Symbol Table (LST header) with
variable system_id identifies the target architecture ( like 0210h means
PA-RISC 1.1) and variable a_magic indicates the format and function of
the file ( like 0619h means relocatable library).

To full understand the archive format i also add the newest Wikipedia
page about "ar (Unix)" by URL https://en.wikipedia.org/wiki/Ar_(Unix) .
According to that page PA-RISC examples like libdbug.a and libz.a
are just 32-bit System V libraries where first member is "/" and
contents of symbol lookup table is stored in a HP-UX specific file
format named System Object Model (SOM).
For debugging purpose the possibly first and second archive member name
by variable ar_name[16] can be displayed by lines like
 >8			string	x	\b, 1st "%.16s"
 >68			string	x	\b, 2nd "%.16s"

I tried to distinguish PA-RISC libraries from other 32-bit System V
libraries in a reliable way by looking beside a_magic also for time
stamp variable version_id which is probably equal or greater than
85082112. I tried to do that step by calling sub routines, but when
returning from sub routines the internal structures of file command seem
to be disturbed.
Furthermore i had again problems when reading bytes at offset beyond
existing ranges. This was the case for example libdelayimp_EMPTY.a. This
8 byte sized file is apparently an empty archive without any member. So
i give up at that stage to refine magic lines in more details.

For libraries in most cases the filename extension "a" is used. But of
course Microsoft do again their own way. There in most cases the
extension "lib" is used like in example lldMinGW.lib. If ar utility is
just used as packing tool then the extension "ar" is used like in
example test020b0619.ar. To show that information by --extension option
i add additional line:
 !:ext	a/lib/ar
It should be possible to distinguish libraries from ar packed archives
by looking for ar_name "/", looking for 64-bit variant by "/SYM64/".
To distinguish BSD variant from System V look for trailing "/"-character
in ar_name. But because of above described problems i was not able to do
these steps.

Furthermore there exist 2 more entries for variants of "current ar
archive", that are files starting with magic string "!<arch>\n".
Third variant handles "MIPS" variant by line
 0	string	=!<arch>\n__________E	MIPS archive
Unfortunately i found no documentation or real world examples to inspect
such examples like artificial test samples testMipsEBUB_.ar and
testMipsELULX.ar. So i do not exactly know how to handle MIPS variant.

Fourth entry handles Debian packages like webmin_1.870_all.deb by line
 0	string		=!<arch>\ndebian

With current Magdir/archive MIPS and Debian get their own message text
or with -k option i get more identifiers. In my opinion it makes more
sense to look only one time for starting magic "!<arch>\n", then print
phrase like "ar archive", then do further inspection and depending on
more test lines print additional phrases like "SVR 64-bit library".
This is also more logical for me, because nearly every modern ar utility
can handle all 4 entry types. The difference is how the archive member
names are stored (BSD versus System V). And often the first member is
just an container for specific meta information. For Debian packages
this is something like control.tar.gz or control.tar.xz. For libraries
this is a symbol lookup table stored in a specific format depending on
platform type. Unfortunately i was not able to unify all 4 entries,
because of above difficulties. So i only prepared Magdir/archive for
unification by putting all "!<arch>" concerned entries together. Or in
other words move intermediate other entries behind. So i moved following
line behind:
 0	search/1	-h-	Software Tools format archive text

Furthermore i see that ar archives itself are sometimes embedded inside
other container formats like Mach-O universal binary described by
Magdir/cafebabe and Magdir/mach. So mention this fact as a note. That
also means to be careful when changing magic lines for ar archives,
because then also items for examples like libcurl-mac.a also changed
indirectly.

After applying the above mentioned modifications by patch
file-5.35-archive-ar.diff and file-5.35-hp-ar.diff
then duplicate entry with phrase "archive file" is vanished and all
inspected examples are now described like:

arduino.ar:
	current ar archive
libcurl-MAC86.a:
	current ar archive random library
libcurl-mac.a:
	Mach-O universal binary with 2 architectures:
	[i386:current ar archive random library]
	[x86_64:current ar archive random library]
lldMinGW.lib:
	current ar archive
libdelayimp_EMPTY.a:
	current ar archive
libdbug.a:
	current ar archive -
	PA-RISC2.0 relocatable library
libz.a:
	current ar archive -
	PA-RISC1.1 relocatable library
test020b0619.ar:
	current ar archive -
	PA-RISC1.0 relocatable library
test02110619.ar:
	current ar archive -
	PA-RISC1.2 relocatable library
testMipsEBUB_.ar:
	MIPS archive with MIPS Ucode members
	with MIPSEB members and an EB hash table\012-
	current ar archive
testMipsELULX.ar:
	MIPS archive with MIPS Ucode members
	with MIPSEL members and an EL hash table -- out of date\012-
	current ar archive
webmin_1.870_all.deb:
	Debian binary package (format 2.0)\012-
	current ar archive

I hope my two diff files can be applied in future version of file utility.
I also started to refine magic lines concerning Debian packages.

With best wishes
Jörg Jenderek
-- 
Jörg Jenderek







-------------- next part --------------
--- file-5.35/magic/Magdir/hp.old	2017-03-17 21:34:26 +0000
+++ file-5.35/magic/Magdir/hp	2019-01-12 21:06:54 +0000
@@ -121,10 +121,4 @@
 >96	belong		>0		- not stripped
 
-0	belong		0x213c6172	archive file
->68	belong 		0x020b0619	- PA-RISC1.0 relocatable library
->68	belong	 	0x02100619	- PA-RISC1.1 relocatable library
->68	belong 		0x02110619	- PA-RISC1.2 relocatable library
->68	belong 		0x02140619	- PA-RISC2.0 relocatable library
-
 #### 500
 0	long		0x02080106	HP s500 relocatable executable
-------------- next part --------------
--- file-5.35/magic/Magdir/archive.old	2018-04-25 00:19:45 +0000
+++ file-5.35/magic/Magdir/archive	2019-01-12 21:12:06 +0000
@@ -258,13 +258,33 @@
 >22	string	X			-- out of date
 
-0	search/1	-h-		Software Tools format archive text
-
 #
 # BSD/SVR2-and-later portable archive formats.
 #
+# Update: Joerg Jenderek
+# URL:		http://fileformats.archiveteam.org/wiki/AR
+# Reference:	https://www.unix.com/man-page/opensolaris/3HEAD/ar.h/
+# Note:		Mach-O universal binary in ./cafebabe is dependent
+# TODO:		unify current ar archive, MIPS archive, Debian package
+#		distinguish BSD, SVR; 32, 64 bit; HP from other 32-bit SVR;
+#		*.ar packages from *.a libraries. handle empty archive
 0	string		=!<arch>\n		current ar archive
+# print first and possibly second ar_name[16] for debugging purpose
+#>8			string	x	\b, 1st "%.16s"
+#>68			string	x	\b, 2nd "%.16s"
 !:mime	application/x-archive
+# a in most case for libraries; lib for Microsoft libraries; ar else cases
+!:ext	a/lib/ar
 >8	string		__.SYMDEF	random library
+# first member with long marked name __.SYMDEF SORTED implies BSD library
 >68	string		__.SYMDEF\ SORTED	random library
+# Reference: https://parisc.wiki.kernel.org/images-parisc/b/b2/Rad_11_0_32.pdf
+# "archive file" entry moved from ./hp
+# LST header system_id 0210h~PA-RISC 1.1,... identifies the target architecture
+# LST header a_magic 0619h~relocatable library
+>68	belong 		0x020b0619	- PA-RISC1.0 relocatable library
+>68	belong	 	0x02100619	- PA-RISC1.1 relocatable library
+>68	belong 		0x02110619	- PA-RISC1.2 relocatable library
+>68	belong 		0x02140619	- PA-RISC2.0 relocatable library
+#EOF for common ar archives
 
 #
@@ -276,4 +296,6 @@
 >68	belong		>1		%d symbol entries
 
+0	search/1	-h-		Software Tools format archive text
+
 # ARC archiver, from Daniel Quinlan (quinlan at yggdrasil.com)
 #


More information about the File mailing list