[File] [PATCH] Magdir/sun, bsdi, hp, , ibm6000, coff, digital executables -duplicates+extension

Christos Zoulas christos at zoulas.com
Sun Mar 31 15:07:16 UTC 2024


Committed, thanks!

christos

> On Mar 25, 2024, at 11:19 AM, Jörg Jenderek (GMX) <joerg.jen.der.ek at gmx.net> wrote:
> 
> Hello,
> some days ago i looked at the content of an exotic CD-ROM. There are
> also stored some executables for different UNIX like operating systems.
> One program is called HYPERHLP. The  corresponding URLs are:
> https://en.wikipedia.org/wiki/SoftPC
> https://archive.org/details/softwin2-unix
> 
> When running file command version 5.45 with -k option on such
> executables i get an output like:
> 
> CALCSIZE:       PA-RISC1.1 shared executable - not stripped
> HYPERHLP:       COFF format alpha demand paged
> 		executable dynamically linked stripped - version 3.11-2
> HYPERHLP-hp:    PA-RISC1.0 shared executable
> HYPERHLP-risc:  executable (RISC System/6000 V3.1) or obj module
> HYPERHLP-sunos: SPARC demand paged
> 		dynamically linked executable
> 		a.out SunOS SPARC demand paged
> 		dynamically linked executable
> 
> With option --extension only 3 byte sequence ??? is shown and with -i
> option only generic application/octet-stream is shown.
> 
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). The PA-RISC executables
> are here also recognized. Here no mime type and no file name suffix is
> shown. The sample HYPERHLP-hp is described as "PA-RISC 1.0 object code
> (generic)" by pa-risc-10.trid.xml. The sample CALCSIZE is described as
> "PA-RISC 1.1 object code (generic)" by  pa-risc-11.trid.xml (See
> appended trid-v-exec.txt.gz).
> 
> For comparison reason i also run the file format identification utility
> DROID (See https://sourceforge.net/projects/droid/). Here the samples
> are not recognized or described wrong. The sample HYPERHLP-risc is
> described as "MPEG 1/2 Audio Layer 3" by PUID fmt/134.
> 
> For sample HYPERHLP-sunos i get "duplicate" messages. One is done by
> lines inside Magdir/bsdi. These look like:
> 0	belong&077777777	0600413	SPARC demand paged
> >0	byte		&0x80
> >>20	belong		<4096		shared library
> >>20	belong		=4096		dynamically linked executable
> >>20	belong		>4096		dynamically linked executable
> >0	byte		^0x80		executable
> >16	belong		>0		not stripped
> >36	belong		0xb4100001	(uses shared libs)
> 
> The second message is done by lines inside Magdir/sun. These look like:
> 0	belong&077777777	0600413	a.out SunOS SPARC demand paged
> >0	byte		&0x80
> >>20	belong		<4096		shared library
> >>20	belong		=4096		dynamically linked executable
> >>20	belong		>4096		dynamically linked executable
> >0	byte		^0x80		executable
> >16	belong		>0		not stripped
> 
> So the same file format is described twice. The difference is that in
> second message the phrase "a.out SunOS" occurs before phrase "SPARC
> demand paged". In first part also a sub classification is done for
> executables with shared libs by last line. So i comment out lines in
> Magdir/sun. On Unix like systems executables typically have no file name
> suffix. So the lines inside Magdir/bsdi now becomes like:
> 
> 0	belong&077777777	0600413		SPARC demand paged
> >0	byte		&0x80
> >>20	belong		<4096		shared library
> >>20	belong		=4096		dynamically linked executable
> >>20	belong		>4096		dynamically linked executable
> !:ext	/
> >0	byte		^0x80		executable
> >16	belong		>0		not stripped
> >36	belong		0xb4100001	(uses shared libs)
> 
> With the help of other tools i found a page about PA-RISC Architecture.
> That informations are expressed by comment lines inside Magdir/hp like:
> # URL:		http://www.openpa.net/arch.html
> # Reference:	http://mark0.net/download/triddefs_xml.7z
> #		defs/p/pa-risc-11.trid.xml
> #		defs/p/pa-risc-10.trid.xml
> 
> The description of PA-RISC executable happens inside Magdir/hp. The
> version 1.0 executable description is done by lines like:
> 0	belong 		0x020b0108	PA-RISC1.0 shared executable
> >168	belong&0x4	0x4		dynamically linked
> >(144)	belong		0x054ef630	dynamically linked
> >96	belong		>0		- not stripped
> On Unix like systems executables typically have no file name suffix. So
> the lines now becomes like:
> 0	belong 		0x020b0108	PA-RISC1.0 shared executable
> !:ext	/
> >168	belong&0x4	0x4		dynamically linked
> >(144)	belong		0x054ef630	dynamically linked
> >96	belong		>0		- not stripped
> 
> The version 1.1 executable description is done by lines like:
> 0	belong 		0x02100108	PA-RISC1.1 shared executable
> >168	belong&0x4	0x4		dynamically linked
> >(144)	belong		0x054ef630	dynamically linked
> >96	belong		>0		- not stripped
> On Unix like systems executables typically have no file name suffix. So
> the lines now becomes like:
> 0	belong 		0x02100108	PA-RISC1.1 shared executable
> !:ext	/
> >168	belong&0x4	0x4		dynamically linked
> >(144)	belong		0x054ef630	dynamically linked
> >96	belong		>0		- not stripped
> 
> The sample HYPERHLP-risc is described by lines inside Magdir/ibm6000.
> These look like:
> 
> 0	beshort	0x01df	executable (RISC System/6000 V3.1) or obj module
> >12	belong	>0	not stripped
> 
> The displaying part can be done by subroutine from Magdir/coff. The
> advantage is that additional tests are done before displaying. Nearly
> all COFF sample are described in same way. That means also sub
> classification if executable or object is done. Furthermore some more
> detail like time stamp are shown. So the above lines now become like:
> 0	beshort		0x01df
> >0	use				display-coff
> 
> The sub routine to display name, flags and more  of Common Object Files
> Format (COFF 32bit) inside Magdir/coff implemented be me some time ago
> starts like:
> 0	name				display-coff
> >18	uleshort&0x8E80	0
> >>2	uleshort	>0
> >>>2	uleshort	<4207
> >>>>0	clear		x
> >>>>0	uleshort	0x014C		Intel 80386
> 
> By first test check for unused flag bits (0x8000, 0x0800, 0x0400,
> 0x0200, x0080 in f_flags) was done. This knowledge is mainly based on
> documentation about intel x86 architecture. Apparently some flag bits
> (0x0800, 0x0400, 0x0200) now seems to be used in RISC System/6000. So
> this test must become more relaxed. The next 2 tests check number of
> sections (f_nscns). This is typically in dozen range. So misidentified
> other file format samples with 0 or thousands of sections are skipped.
> The displaying part start with Intel 80386 by looking for specific start
> magic at offset 0. So with relaxed first test and activated check for
> magic (f_magic=0x01DF) of RISC System/6000 the sub routine now starts as:
> 0	name				display-coff
> >18	uleshort&0x8E80	0
> >>2	uleshort	>0
> >>>2	uleshort	<4207
> >>>>0	clear		x
> >>>>0	uleshort	0x014C		Intel 80386
> >>>>0	uleshort	0x01DF		RISC System/6000 V3.1
> 
> The samples are later described as executable if the F_EXEC flag bit is
> set. This is done by line like:
> >>>>18	leshort		&0x0002		executable
> Now afterwards show an user defined mime type. On Unix like systems
> executables typically have no file name suffix. This is done by
> additional lines like:
> !:mime	application/x-coff-executable
> !:ext	/
> 
> The time-stamp f_timdat seems to be also correct for endian variant.
> This is shown by line like:
> >>>>4	ledate		>0		\b, created %s
> 
> If samples contain no optional header then after header part comes first
> section. This starts with section name (s_name[8] like .text .data
> .debug$S .drectve .testseg). This is done by lines like
> >>>>16	uleshort	=0
> >>>>>20	string		x	\b, 1st section name "%.8s"
> 
> For samples with optional header this comes after header and this is
> then followed by first section. In current magic file at the moment only
> non zero option header size (f_opthdr) is shown by line like:
> >>>>16	uleshort	>0		\b, optional header size %u
> Now i also show first section name for samples (like IBM\HH\HYPERHLP)
> with option header (f_opthdr=72) by line like:
> >>>>(16.s+20)	string	x		\b, 1st section name "%.8s"
> 
> Luckily that test expression is also true for samples without optional
> header. Now i can show if wanted more variables of first section by
> lines like:
> # physical address s_paddr like: 0
> #>>>>(16.s+28)	lelong		!0		\b, s_paddr %#8.8x
> # virtual address s_vaddr like: 0
> #>>>>(16.s+32)	lelong		!0		\b, s_vaddr %#8.8x
> # section size s_size
> #>>>>(16.s+36)	lelong		x		\b, s_size %#8.8x
> # file ptr to raw data for section s_scnpt
> #>>>>(16.s+40)	lelong		x		\b, s_scnpt %#8.8x
> # file ptr to relocation s_relptr like: 0
> #>>>>(16.s+44)	lelong		!0		\b, s_relptr %#8.8x
> # file ptr to gp histogram s_lnnoptr like: 0
> #>>>>(16.s+48)	lelong		!0		\b, s_lnnoptr %#8.8x
> # number of relocation entries s_nreloc like:
> # 0 1 2 5 6 8 19h 26h 27h 38h 50h 5Fh 89h Dh 1Ch 69h A9h 1DCh 651h
> #>>>>(16.s+52)	uleshort	x		\b, s_nreloc %#4.4x
> # number of gp histogram entries s_nlnno like: 0
> #>>>>(16.s+54)	uleshort	!0		\b, s_nlnno %#4.4x
> # flags s_flags
> #>>>>(16.s+56)	lelong		x		\b, s_flags %#8.8x
> 
> If the samples contain more than 1 section then afterwards comes second
> section. Here again this start with section name (like .bss .data
> .debug$S .rsrc$01). So show this second second name by lines like:
> >>>>2	uleshort	>1
> >>>>>(16.s+60)	string		x	\b, 2nd section name "%.8s"
> 
> Most section names start with point character except samples created by
> "exotic" compilers, but unfortunately i do not remember and found such
> samples any more. When magic test lines for COFF samples is still too
> weak then tests for that point character can be used to avoid collisions.
> 
> Samples like HYPERHLP are described by lines inside Magdir/digital.
> These look like:
> 
> >24	leshort		0413		COFF format alpha demand paged
> >>22	leshort&030000	!020000		executable
> >>22	leshort&020000	!0		dynamically linked
> >>16	lelong		!0		not stripped
> >>16	lelong		0		stripped
> >>27	byte		x		- version %d
> >>26	byte		x		\b.%d
> >>28	byte		x		\b-%d
> 
> Unfortunately the referenced documentation for COFF does not apply here.
> The mentioned documentation are mainly based on Intel x86 architecture.
> I assume that the inspected alpha architecture is 64 bit based with
> other header structures. So i can not use sub routine display-coff. On
> Unix like systems executables typically have no file name suffix. So i
> keep lines and add one line like:
> !:ext	/
> 
> After applying the above mentioned modifications by patches
> file-5.45-hp-pa-risc.diff	file-5.45-sun-hyperhlp.diff
> file-5.45-bsdi-hyperhlp.diff	file-5.45-ibm6000-hyperhlp.diff
> file-5.45-coff-hyperhlp.diff	file-5.45-digital-hyperhlp.diff
> then all my inspected executables are still described. But now i get
> some more details (That can be used to avoid collisions that may be are
> triggered by too short pattern). Furthermore duplicate messages are
> vanished. This with -k option now looks like:
> 
> CALCSIZE:       PA-RISC1.1 shared executable - not stripped
> HYPERHLP:       COFF format alpha demand paged
> 		executable dynamically linked stripped - version 3.11-2
> HYPERHLP-hp:    PA-RISC1.0 shared executable
> HYPERHLP-risc:  RISC System/6000 V3.1 COFF executable
> 		, no relocation info, no line number info, stripped
> 		, 7 sections, optional header size 72
> 		, created Thu Jan 12 19:35:47 1995
> 		, 1st section name ".pad", 2nd section name ".text"
> HYPERHLP-sunos: SPARC demand paged
> 		dynamically linked executable
> 
> When running with --extension option now output looks like:
> CALCSIZE:       /
> HYPERHLP:       /
> HYPERHLP-hp:    /
> HYPERHLP-risc:  /
> HYPERHLP-sunos: /
> 
> I hope my diff files can be applied in future version of file
> utility.
> 
> With best wishes,
> Jörg Jenderek
> --
> Jörg Jenderek
> <Nachrichtenteil als Anhang.DEFANGED-49><trid-v-exec.txt.gz><file-5_45-sun-hyperhlp_diff.DEFANGED-50><file-5_45-sun-hyperhlp_diff_sig.DEFANGED-51><file-5_45-bsdi-hyperhlp_diff.DEFANGED-52><file-5_45-bsdi-hyperhlp_diff_sig.DEFANGED-53><file-5_45-hp-pa-risc_diff.DEFANGED-54><file-5_45-hp-pa-risc_diff_sig.DEFANGED-55><file-5_45-digital-hyperhlp_diff.DEFANGED-56><file-5_45-digital-hyperhlp_diff_sig.DEFANGED-57><file-5_45-ibm6000-hyperhlp_diff.DEFANGED-58><file-5_45-ibm6000-hyperhlp_diff_sig.DEFANGED-59><file-5_45-coff-hyperhlp_diff.DEFANGED-60><file-5_45-coff-hyperhlp_diff_sig.DEFANGED-61>-- 
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>



More information about the File mailing list