[File] [PATCH] of Magdir/archive for cpio archives; extensions + misidentified DROID signature

Jörg Jenderek joerg.jen.der.ek at gmx.net
Tue Feb 28 22:35:52 UTC 2023


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

some days ago i looked at medium of my last Linux installation. It
was a SUSE LEAP. So i looked at files in boot/x86_64/loader
directory. There was a file named like bootlogo.

When running file command version 5.44 on such examples and other
cpio archive i get an output like:

MainActor-2.06.3.cpio:         ASCII cpio archive (SVR4 with no CRC)
VOL.000.008:                   ASCII cpio archive (pre-SVR4 or odc)
VOL.000.012:                   ASCII cpio archive (pre-SVR4 or odc)
bootlogo:                      cpio archive
cinema.cpi:                    ASCII cpio archive (pre-SVR4 or odc)
clam.bin-be.cpio:              byte-swapped cpio archive
fmt-635-signature-id-960.cpio: cpio archive
message.cpi:                   cpio archive
pcmcia:                        ASCII cpio archive (SVR4 with CRC)
pthreads-1.60B5.osr5src.cpio:  ASCII cpio archive (pre-SVR4 or odc)
skeleton2.cpio:                byte-swapped cpio archive
tar-1.27.cpio:                 ASCII cpio archive (SVR4 with CRC)
ttyS0.cpio:                    cpio archive

With --extension option only ??? is displayed. Furthermore with -i
option for samples expected application/x-cpio is shown.

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). This list the used
file name extension and often with -v option the related URL
pointing to used file format information. The examples described by
file command with additional "ASCII" phrase are described here as
"CPIO archive (portable)" by ark-cpio.trid.xml. The samples like
skeleton2.cpio found on telparia.com which are described by file
command as "byte-swapped cpio archive" are described here as "CPIO
archive (byte swapped binary)" by ark-cpio-bin-sw.trid.xml.
The samples like bootlogo which are described by file command as
"cpio archive" are described here as "CPIO archive (binary)" by
ark-cpio-bin.trid.xml. As file name suffix only CPIO is displayed
here and only generic mime type application/octet-stream is shown for
all samples (See appended trid-v-cpio.txt.gz).

For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/). It does
only recognize the one binary variant like in samples bootlogo
fmt-635-signature-id-960.cpio that are described by file command as
"cpio archive". Here these are described as "CPIO" by PUID fmt/635.
Here no mime type is shown. Also only suffix CPIO is considered as
valid (EXTENSION_MISMATCH is false). So samples bootlogo and
message.cpi are marked as "bad" (See appended droid-cpio.csv.gz).

With the additional help of TrID command i was able to find a page on
file formats archive team web site. There also links to Wikipedia
page about cpio and downloadable samples are listed. The
specifications can be found for example on mentioned cpio(5) format
man page. These informations are expressed by additional comment
lines inside Magdir/archive. These look like:
# URL:		http://fileformats.archiveteam.org/wiki/Cpio
#		https://en.wikipedia.org/wiki/Cpio
# Reference:	https://people.freebsd.org/~kientzle/libarchive/
#		man/cpio.5.txt
#		http://mark0.net/download/triddefs_xml.7z
#		defs/a/ark-cpio-bin.trid.xml
#		defs/a/ark-cpio-bin-sw.trid.xml
#		defs/a/ark-cpio.trid.xml

Samples like skeleton2.cpio are described inside Magdir/archive
by lines like:
0	short		0143561		byte-swapped cpio archive
!:mime	application/x-cpio # encoding: swapped

The standard suffix apparently is cpio. For this variant i found no
other suffix. So this is expressed by additional line like:
!:ext	cpio
Because some samples are misidentified i look at bytes after magic.
This is done here by calling new subroutine cpio-bin-be according to
specification in man page. So this looks like:
 >0	use	cpio-bin-be
 0	name	cpio-bin-be
 >2	ubeshort	x		\b, device %u
 >4	ubeshort	x		\b, inode %u
 >6	ubeshort	x		\b, mode %o
 >8	ubeshort	x		\b, uid %u
 >10	ubeshort	x		\b, gid %u
 >12	ubeshort	>1		\b, %u links
 >14	ubeshort	>0		\b, device %#4.4x
 >16	bedate		x		\b, modified %s
 >22	ubelong	 	x		\b, %u bytes
 >20	ubeshort	x		\b, namesize %u
 >26	string		x		"%s"
So we see that in skeleton2.cpio first archive member is current
directory ("." size 0 bytes 16 links) with mode 40755, dated "Sun
Jun  7 04:19:15 1998" and belonging to user root (uid 0, gid 0).
These information can partly verified by running 7-zip command line
tool (See appended  7z-l-cpio.txt.gz) or GNU cpio by lines like:
	7z l -tcpio -slt *.cpio
	LANGUAGE=C cpio -ivt --numeric-uid-gid --file=skeleton2.cpio

Samples like fmt-635-signature-id-960.cpio, bootlogo, message.cpi
and ttyS0.cpio are described inside Magdir/archive by lines like:
0	short		070707		cpio archive
!:mime	application/x-cpio

The standard suffix apparently is cpio. But i also found sample like
message.cpi with 3 byte suffix cpi. It is not explained why, but i
assume this is triggered by DOS FAT name limitation to 8+3 name
length. On Unix like system you do not have the concept of file type
is indicated by file name suffix. So here i find samples without file
name suffix like bootlogo. So for this variant i found 3 file name
extensions. So this is expressed by adaptional line like:
!:ext	/cpio/cpi

The bootlogo for example can be found inside boot/x86_64/loader
directory on CD-ROM image openSUSE-Leap-15.4-NET-x86_64-Media.iso of
SUSE distribution. Many people including myself complaining about
in-transparency and proprietary software of Microsoft Windows
systems. But Linux on the other hand is also not the Holy Grail. So
SUSE started patching the syslinux loader with their own
implementation of a graphical boot screen (gfxboot). Nowadays you
find this module gfxboot.c32 in current syslinux distribution (for
example on Raspbian 11) but option and configuration is not well
explained. The standard  configuration file isolinux.cfg contains a
directive line like:
	ui gfxboot bootlogo message
That means load module gfxboot.c32 and boot logo instructions are
taken from cpio archive with name "bootlogo" and "Welcome to
openSUSE Leap" text is taken from file with name "message". That is
confusing even Linux professionals switching from other
distributions like Ubuntu, because workflow of boot loader program
is a "little" bit other organized. It is also confusing Windows
user willing to change to Linux because files like "bootlogo" have
no name suffix, whereas Windows people are used to the concept that
file type is indicated by name suffix.

Because some samples are misidentified i look at bytes after magic.
This is done here by calling new subroutine cpio-bin. In principal it
is the same as for byte swapped. So instead ubeshort the little
endian uleshort must be used. So this looks like:
 >0	use	cpio-bin
 0	name	cpio-bin
 >2	uleshort	x		\b; device %u
 >4	uleshort	x		\b, inode %u
 >6	uleshort	x		\b, mode %o
 >8	uleshort	x		\b, uid %u
 >10	uleshort	x		\b, gid %u
 >12	uleshort	>1		\b, %u links
 >14	uleshort	>0		\b, device %#4.4x
 >16	medate		x		\b, modified %s
 >22	melong		x		\b, %u bytes
 #>20	uleshort	x		\b, namesize %u
 >26	string		x		"%s"
Something is here not visible at first glance. Some fields have a
length of 4 bytes or "32 bits" in other words like modification time
and file size expressed in number of bytes. According to
specification this are stored as two 16 bits where first comes
most-significant part. So i must use medate for time field and melong
for c_filesize. I verified this by running again 7-zip and cpio
command line tools. On my little endian machines this is correct. But
i am not sure that is true when running on big endian machines.
So we see that in ttyS0.cpio first archive member is "/dev/ttyS0".
So this is no real file. It is a character device with read+write by
user and write by group (0 bytes; octal mode 20620; This is shown for
example as "crw--w----" by `ls -l`). ttyS0 belongs to user root and
group tty (uid 0, gid 5 as also shown for example by `ls
- --numeric-uid-gid`). The c_ino field contains truncated inode number
or the 16 lower bits (shown for example by `ls --inode`).
I do not know what exactly is described by decimal expressed device
number. There exist another device number c_rdev relevant only for
block and character devices that is shown as hexadecimal value. For
ttyS0 example this value is 0x0440 (that is decimal 4 64 shown by `ls
- -l` command after group field).

The sample fmt-635-signature-id-960.cpio is really not an archive. It
contains just the first 2 magic bytes of cpio archive and it is used
by DROID as signature to recognize such cpio archives. So i skip this
sample by second test which look for pathname of first archive
member. So the starting lines now becomes like:
 0	short		070707
 >26	string		>\0		cpio archive
 !:mime	application/x-cpio
 !:ext	/cpio/cpi
 >>0	use	cpio-bin

After applying the above mentioned modifications by patch
file-5.44-archive-cpio.diff then my samples are described as before
but now with more details and correct name suffix. Furthermore DROID
signature sample is not misidentified any more. This now looks like:

MainActor-2.06.3.cpio:         ASCII cpio archive (SVR4 with no CRC)
VOL.000.008:                   ASCII cpio archive (pre-SVR4 or odc)
VOL.000.012:                   ASCII cpio archive (pre-SVR4 or odc)
bootlogo:                      cpio archive
			       ; device 0, inode 0, mode 100644
			       , uid 0, gid 0
			       , modified Wed Jan 26 15:10:41 2022
			       , 105769 bytes "init"
cinema.cpi:                    ASCII cpio archive (pre-SVR4 or odc)
clam.bin-be.cpio:              byte-swapped cpio archive
			       ; device 3, inode 57883, mode 100644
			       , uid 501, gid 20
			       , modified Mon Jul  6 12:31:12 2009
			       , 544 bytes "clam.exe"
fmt-635-signature-id-960.cpio: ISO-8859 text
message.cpi:                   cpio archive
			       ; device 774, inode 8892, mode 100644
			       , uid 99, gid 99
			       , modified Fri May 19 09:14:34 2006
			       , 37860 bytes "16x16.fnt"
pcmcia:                        ASCII cpio archive (SVR4 with CRC)
pthreads-1.60B5.osr5src.cpio:  ASCII cpio archive (pre-SVR4 or odc)
skeleton2.cpio:                byte-swapped cpio archive
			       ; device 2054, inode 36, mode 40755
			       , uid 0, gid 0, 16 links
			       , modified Sun Jun  7 04:19:15 1998
			       , 0 bytes "."
tar-1.27.cpio:                 ASCII cpio archive (SVR4 with CRC)
ttyS0.cpio:                    cpio archive
			       ; device 5, inode 133, mode 20620
			       , uid 0, gid 5, device 0x0440
			       , modified Sat Feb 18 01:21:45 2023
			       , 0 bytes "/dev/ttyS0"

When now running with --extension option i get output like:

MainActor-2.06.3.cpio:         cpio
VOL.000.008:                   cpio/cpi/008/012
VOL.000.012:                   cpio/cpi/008/012
bootlogo:                      /cpio/cpi
cinema.cpi:                    cpio/cpi/008/012
clam.bin-be.cpio:              cpio
fmt-635-signature-id-960.cpio: ???
message.cpi:                   /cpio/cpi
pcmcia:                        /cpio
pthreads-1.60B5.osr5src.cpio:  cpio/cpi/008/012
skeleton2.cpio:                cpio
tar-1.27.cpio:                 /cpio
ttyS0.cpio:                    /cpio/cpi

I hope my diff files can be applied in future version of
file utility.

With best wishes
Jörg Jenderek
- --
Jörg Jenderek
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCY/6BjAAKCRCv8rHJQhrU
1og4AJ4gz/+NlYyZrPrvGviezz+F4bJQtACeNacdC+46/6Tcd0ZQcYHQKHFATJU=
=YpRr
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: droid-cpio.csv.gz
Type: application/x-gzip
Size: 705 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230228/80bb1228/attachment-0003.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-cpio.txt.gz
Type: application/x-gzip
Size: 548 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230228/80bb1228/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 7z-l-cpio.txt.gz
Type: application/x-gzip
Size: 35936 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230228/80bb1228/attachment-0005.bin>
-------------- next part --------------
--- file-5.44/magic/Magdir/archive.old	2022-12-26 19:00:47.000000000 +0100
+++ file-5.44/magic/Magdir/archive	2023-02-28 02:57:55.437564600 +0100
@@ -187,12 +187,84 @@
 # character-header formats and thus are strings, not numbers.
-0	short		070707		cpio archive
+# URL:		http://fileformats.archiveteam.org/wiki/Cpio
+#		https://en.wikipedia.org/wiki/Cpio
+# Reference:	https://people.freebsd.org/~kientzle/libarchive/man/cpio.5.txt
+# Update:	Joerg Jenderek
+#
+# Reference:    http://mark0.net/download/triddefs_xml.7z/defs/a/ark-cpio-bin.trid.xml
+# Note:		called "CPIO archive (binary)" by TrID, "cpio/Binary LE" by 7-Zip and "CPIO" by DROID via PUID fmt/635
+0	short		070707
+# skip DROID fmt-635-signature-id-960.cpio by looking for pathname of 1st entry
+>26	string		>\0		cpio archive
 !:mime	application/x-cpio
+# https://download.opensuse.org/distribution/leap/15.4/iso/openSUSE-Leap-15.4-NET-x86_64-Media.iso
+# boot/x86_64/loader/bootlogo
+# message.cpi
+!:ext	/cpio/cpi
+>>0	use	cpio-bin
+# Reference:    http://mark0.net/download/triddefs_xml.7z/defs/a/ark-cpio-bin-sw.trid.xml
+# Note:		called "CPIO archive (byte swapped binary)" by TrID and "Cpio/Binary BE" by 7-Zip
 0	short		0143561		byte-swapped cpio archive
 !:mime	application/x-cpio # encoding: swapped
+# https://telparia.com/fileFormatSamples/archive/cpio/skeleton2.cpio
+!:ext	cpio
+>0	use	cpio-bin-be
+# Reference:    http://mark0.net/download/triddefs_xml.7z/defs/a/ark-cpio.trid.xml
+# Note:		called "CPIO archive (portable)" by TrID, "cpio/Portable ASCII" by 7-Zip and "cpio/odc" by GNU cpio
 0	string		070707		ASCII cpio archive (pre-SVR4 or odc)
 !:mime	application/x-cpio
+# https://telparia.com/fileFormatSamples/archive/cpio/ pthreads-1.60B5.osr5src.cpio cinema.cpi VOL.000.008 VOL.000.012
+!:ext	cpio/cpi/008/012
+# Note:		called "CPIO archive (portable)" by TrID, "cpio/New ASCII" by 7-Zip and "cpio/newc" by GNU cpio
 0	string		070701		ASCII cpio archive (SVR4 with no CRC)
 !:mime	application/x-cpio
+# https://telparia.com/fileFormatSamples/archive/cpio/MainActor-2.06.3.cpio
+!:ext	cpio
+# Note:		called "CPIO archive (portable)" by TrID, "cpio/New CRC" by 7-Zip and "cpio/crc" by GNU cpio
 0	string		070702		ASCII cpio archive (SVR4 with CRC)
 !:mime	application/x-cpio
+# http://ftp.gnu.org/gnu/tar/tar-1.27.cpio.gz
+# https://telparia.com/fileFormatSamples/archive/cpio/pcmcia
+!:ext	/cpio
+#	display information of old binary cpio archive
+# Note:	verfied by 7-Zip `7z l -tcpio -slt *.cpio` and
+#	`cpio -ivt --numeric-uid-gid --file=clam.bin-le.cpio`
+0	name	cpio-bin
+# c_dev; device number; WHAT IS THAT?
+>2	uleshort	x		\b; device %u
+# c_ino; truncated inode number; use `ls --inode`
+>4	uleshort	x		\b, inode %u
+# c_mode; mode specifies permissions and file type like: ?622~?rw-r--r-- by `ls -l`
+>6	uleshort	x		\b, mode %o
+# c_uid; numeric user id; use `ls --numeric-uid-gid`
+>8	uleshort	x		\b, uid %u
+# c_gid; numeric group id
+>10	uleshort	x		\b, gid %u
+# c_nlink; links to this file; directories at least 2
+>12	uleshort	>1		\b, %u links
+# c_rdev; device number for block and character entries; zero for all other entries by writers
+# like 0x0440 for /dev/ttyS0
+>14	uleshort	>0		\b, device %#4.4x
+# c_mtime[2]; modification time in seconds since 1 January 1970; most-significant 16 bits first 
+>16	medate		x		\b, modified %s
+# c_filesize[2]; size of pathname; most-significant 16 bits first like: 544
+>22	melong		x		\b, %u bytes
+# c_namesize; bytes in the pathname that follows the header like: 9
+#>20	uleshort	x		\b, namesize %u
+# pathname of entry like: "clam.exe"
+>26	string		x		"%s"
+#	display information of old binary byte swapped cpio archive
+# Note:	verfied by 7-Zip `7z l -tcpio -slt *.cpio` and
+#	`LANGUAGE=C cpio -ivt --numeric-uid-gid --file=clam.bin-be.cpio`
+0	name	cpio-bin-be
+>2	ubeshort	x		\b; device %u
+>4	ubeshort	x		\b, inode %u
+>6	ubeshort	x		\b, mode %o
+>8	ubeshort	x		\b, uid %u
+>10	ubeshort	x		\b, gid %u
+>12	ubeshort	>1		\b, %u links
+>14	ubeshort	>0		\b, device %#4.4x
+>16	bedate		x		\b, modified %s
+>22	ubelong	 	x		\b, %u bytes
+#>20	ubeshort	x		\b, namesize %u
+>26	string		x		"%s"
 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.44-archive-cpio.diff.sig
Type: application/octet-stream
Size: 1750 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230228/80bb1228/attachment-0001.obj>


More information about the File mailing list