[File] [PATCH] of Magdir/archive for cpio archives; extensions + misidentified DROID signature

Christos Zoulas christos at zoulas.com
Thu Mar 30 09:56:04 UTC 2023


Committed, thanks!

christos

> On Feb 28, 2023, at 6:35 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hello,
> 
> some days ago i looked at medium of my last Linux installation. It
> was a SUSE LEAP. So i looked at files in boot/x86_64/loader
> directory. There was a file named like bootlogo.
> 
> When running file command version 5.44 on such examples and other
> cpio archive i get an output like:
> 
> MainActor-2.06.3.cpio:         ASCII cpio archive (SVR4 with no CRC)
> VOL.000.008:                   ASCII cpio archive (pre-SVR4 or odc)
> VOL.000.012:                   ASCII cpio archive (pre-SVR4 or odc)
> bootlogo:                      cpio archive
> cinema.cpi:                    ASCII cpio archive (pre-SVR4 or odc)
> clam.bin-be.cpio:              byte-swapped cpio archive
> fmt-635-signature-id-960.cpio: cpio archive
> message.cpi:                   cpio archive
> pcmcia:                        ASCII cpio archive (SVR4 with CRC)
> pthreads-1.60B5.osr5src.cpio:  ASCII cpio archive (pre-SVR4 or odc)
> skeleton2.cpio:                byte-swapped cpio archive
> tar-1.27.cpio:                 ASCII cpio archive (SVR4 with CRC)
> ttyS0.cpio:                    cpio archive
> 
> With --extension option only ??? is displayed. Furthermore with -i
> option for samples expected application/x-cpio is shown.
> 
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). This list the used
> file name extension and often with -v option the related URL
> pointing to used file format information. The examples described by
> file command with additional "ASCII" phrase are described here as
> "CPIO archive (portable)" by ark-cpio.trid.xml. The samples like
> skeleton2.cpio found on telparia.com which are described by file
> command as "byte-swapped cpio archive" are described here as "CPIO
> archive (byte swapped binary)" by ark-cpio-bin-sw.trid.xml.
> The samples like bootlogo which are described by file command as
> "cpio archive" are described here as "CPIO archive (binary)" by
> ark-cpio-bin.trid.xml. As file name suffix only CPIO is displayed
> here and only generic mime type application/octet-stream is shown for
> all samples (See appended trid-v-cpio.txt.gz).
> 
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/). It does
> only recognize the one binary variant like in samples bootlogo
> fmt-635-signature-id-960.cpio that are described by file command as
> "cpio archive". Here these are described as "CPIO" by PUID fmt/635.
> Here no mime type is shown. Also only suffix CPIO is considered as
> valid (EXTENSION_MISMATCH is false). So samples bootlogo and
> message.cpi are marked as "bad" (See appended droid-cpio.csv.gz).
> 
> With the additional help of TrID command i was able to find a page on
> file formats archive team web site. There also links to Wikipedia
> page about cpio and downloadable samples are listed. The
> specifications can be found for example on mentioned cpio(5) format
> man page. These informations are expressed by additional comment
> lines inside Magdir/archive. These look like:
> # URL:		http://fileformats.archiveteam.org/wiki/Cpio
> #		https://en.wikipedia.org/wiki/Cpio
> # Reference:	https://people.freebsd.org/~kientzle/libarchive/
> #		man/cpio.5.txt
> #		http://mark0.net/download/triddefs_xml.7z
> #		defs/a/ark-cpio-bin.trid.xml
> #		defs/a/ark-cpio-bin-sw.trid.xml
> #		defs/a/ark-cpio.trid.xml
> 
> Samples like skeleton2.cpio are described inside Magdir/archive
> by lines like:
> 0	short		0143561		byte-swapped cpio archive
> !:mime	application/x-cpio # encoding: swapped
> 
> The standard suffix apparently is cpio. For this variant i found no
> other suffix. So this is expressed by additional line like:
> !:ext	cpio
> Because some samples are misidentified i look at bytes after magic.
> This is done here by calling new subroutine cpio-bin-be according to
> specification in man page. So this looks like:
>> 0	use	cpio-bin-be
> 0	name	cpio-bin-be
>> 2	ubeshort	x		\b, device %u
>> 4	ubeshort	x		\b, inode %u
>> 6	ubeshort	x		\b, mode %o
>> 8	ubeshort	x		\b, uid %u
>> 10	ubeshort	x		\b, gid %u
>> 12	ubeshort	>1		\b, %u links
>> 14	ubeshort	>0		\b, device %#4.4x
>> 16	bedate		x		\b, modified %s
>> 22	ubelong	 	x		\b, %u bytes
>> 20	ubeshort	x		\b, namesize %u
>> 26	string		x		"%s"
> So we see that in skeleton2.cpio first archive member is current
> directory ("." size 0 bytes 16 links) with mode 40755, dated "Sun
> Jun  7 04:19:15 1998" and belonging to user root (uid 0, gid 0).
> These information can partly verified by running 7-zip command line
> tool (See appended  7z-l-cpio.txt.gz) or GNU cpio by lines like:
> 	7z l -tcpio -slt *.cpio
> 	LANGUAGE=C cpio -ivt --numeric-uid-gid --file=skeleton2.cpio
> 
> Samples like fmt-635-signature-id-960.cpio, bootlogo, message.cpi
> and ttyS0.cpio are described inside Magdir/archive by lines like:
> 0	short		070707		cpio archive
> !:mime	application/x-cpio
> 
> The standard suffix apparently is cpio. But i also found sample like
> message.cpi with 3 byte suffix cpi. It is not explained why, but i
> assume this is triggered by DOS FAT name limitation to 8+3 name
> length. On Unix like system you do not have the concept of file type
> is indicated by file name suffix. So here i find samples without file
> name suffix like bootlogo. So for this variant i found 3 file name
> extensions. So this is expressed by adaptional line like:
> !:ext	/cpio/cpi
> 
> The bootlogo for example can be found inside boot/x86_64/loader
> directory on CD-ROM image openSUSE-Leap-15.4-NET-x86_64-Media.iso of
> SUSE distribution. Many people including myself complaining about
> in-transparency and proprietary software of Microsoft Windows
> systems. But Linux on the other hand is also not the Holy Grail. So
> SUSE started patching the syslinux loader with their own
> implementation of a graphical boot screen (gfxboot). Nowadays you
> find this module gfxboot.c32 in current syslinux distribution (for
> example on Raspbian 11) but option and configuration is not well
> explained. The standard  configuration file isolinux.cfg contains a
> directive line like:
> 	ui gfxboot bootlogo message
> That means load module gfxboot.c32 and boot logo instructions are
> taken from cpio archive with name "bootlogo" and "Welcome to
> openSUSE Leap" text is taken from file with name "message". That is
> confusing even Linux professionals switching from other
> distributions like Ubuntu, because workflow of boot loader program
> is a "little" bit other organized. It is also confusing Windows
> user willing to change to Linux because files like "bootlogo" have
> no name suffix, whereas Windows people are used to the concept that
> file type is indicated by name suffix.
> 
> Because some samples are misidentified i look at bytes after magic.
> This is done here by calling new subroutine cpio-bin. In principal it
> is the same as for byte swapped. So instead ubeshort the little
> endian uleshort must be used. So this looks like:
>> 0	use	cpio-bin
> 0	name	cpio-bin
>> 2	uleshort	x		\b; device %u
>> 4	uleshort	x		\b, inode %u
>> 6	uleshort	x		\b, mode %o
>> 8	uleshort	x		\b, uid %u
>> 10	uleshort	x		\b, gid %u
>> 12	uleshort	>1		\b, %u links
>> 14	uleshort	>0		\b, device %#4.4x
>> 16	medate		x		\b, modified %s
>> 22	melong		x		\b, %u bytes
> #>20	uleshort	x		\b, namesize %u
>> 26	string		x		"%s"
> Something is here not visible at first glance. Some fields have a
> length of 4 bytes or "32 bits" in other words like modification time
> and file size expressed in number of bytes. According to
> specification this are stored as two 16 bits where first comes
> most-significant part. So i must use medate for time field and melong
> for c_filesize. I verified this by running again 7-zip and cpio
> command line tools. On my little endian machines this is correct. But
> i am not sure that is true when running on big endian machines.
> So we see that in ttyS0.cpio first archive member is "/dev/ttyS0".
> So this is no real file. It is a character device with read+write by
> user and write by group (0 bytes; octal mode 20620; This is shown for
> example as "crw--w----" by `ls -l`). ttyS0 belongs to user root and
> group tty (uid 0, gid 5 as also shown for example by `ls
> - --numeric-uid-gid`). The c_ino field contains truncated inode number
> or the 16 lower bits (shown for example by `ls --inode`).
> I do not know what exactly is described by decimal expressed device
> number. There exist another device number c_rdev relevant only for
> block and character devices that is shown as hexadecimal value. For
> ttyS0 example this value is 0x0440 (that is decimal 4 64 shown by `ls
> - -l` command after group field).
> 
> The sample fmt-635-signature-id-960.cpio is really not an archive. It
> contains just the first 2 magic bytes of cpio archive and it is used
> by DROID as signature to recognize such cpio archives. So i skip this
> sample by second test which look for pathname of first archive
> member. So the starting lines now becomes like:
> 0	short		070707
>> 26	string		>\0		cpio archive
> !:mime	application/x-cpio
> !:ext	/cpio/cpi
>>> 0	use	cpio-bin
> 
> After applying the above mentioned modifications by patch
> file-5.44-archive-cpio.diff then my samples are described as before
> but now with more details and correct name suffix. Furthermore DROID
> signature sample is not misidentified any more. This now looks like:
> 
> MainActor-2.06.3.cpio:         ASCII cpio archive (SVR4 with no CRC)
> VOL.000.008:                   ASCII cpio archive (pre-SVR4 or odc)
> VOL.000.012:                   ASCII cpio archive (pre-SVR4 or odc)
> bootlogo:                      cpio archive
> 			       ; device 0, inode 0, mode 100644
> 			       , uid 0, gid 0
> 			       , modified Wed Jan 26 15:10:41 2022
> 			       , 105769 bytes "init"
> cinema.cpi:                    ASCII cpio archive (pre-SVR4 or odc)
> clam.bin-be.cpio:              byte-swapped cpio archive
> 			       ; device 3, inode 57883, mode 100644
> 			       , uid 501, gid 20
> 			       , modified Mon Jul  6 12:31:12 2009
> 			       , 544 bytes "clam.exe"
> fmt-635-signature-id-960.cpio: ISO-8859 text
> message.cpi:                   cpio archive
> 			       ; device 774, inode 8892, mode 100644
> 			       , uid 99, gid 99
> 			       , modified Fri May 19 09:14:34 2006
> 			       , 37860 bytes "16x16.fnt"
> pcmcia:                        ASCII cpio archive (SVR4 with CRC)
> pthreads-1.60B5.osr5src.cpio:  ASCII cpio archive (pre-SVR4 or odc)
> skeleton2.cpio:                byte-swapped cpio archive
> 			       ; device 2054, inode 36, mode 40755
> 			       , uid 0, gid 0, 16 links
> 			       , modified Sun Jun  7 04:19:15 1998
> 			       , 0 bytes "."
> tar-1.27.cpio:                 ASCII cpio archive (SVR4 with CRC)
> ttyS0.cpio:                    cpio archive
> 			       ; device 5, inode 133, mode 20620
> 			       , uid 0, gid 5, device 0x0440
> 			       , modified Sat Feb 18 01:21:45 2023
> 			       , 0 bytes "/dev/ttyS0"
> 
> When now running with --extension option i get output like:
> 
> MainActor-2.06.3.cpio:         cpio
> VOL.000.008:                   cpio/cpi/008/012
> VOL.000.012:                   cpio/cpi/008/012
> bootlogo:                      /cpio/cpi
> cinema.cpi:                    cpio/cpi/008/012
> clam.bin-be.cpio:              cpio
> fmt-635-signature-id-960.cpio: ???
> message.cpi:                   /cpio/cpi
> pcmcia:                        /cpio
> pthreads-1.60B5.osr5src.cpio:  cpio/cpi/008/012
> skeleton2.cpio:                cpio
> tar-1.27.cpio:                 /cpio
> ttyS0.cpio:                    /cpio/cpi
> 
> I hope my diff files can be applied in future version of
> file utility.
> 
> With best wishes
> Jörg Jenderek
> - --
> Jörg Jenderek
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
> 
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCY/6BjAAKCRCv8rHJQhrU
> 1og4AJ4gz/+NlYyZrPrvGviezz+F4bJQtACeNacdC+46/6Tcd0ZQcYHQKHFATJU=
> =YpRr
> -----END PGP SIGNATURE-----
> <droid-cpio.csv.gz><trid-v-cpio.txt.gz><7z-l-cpio.txt.gz><file-5_44-archive-cpio_diff.DEFANGED-13><file-5_44-archive-cpio_diff_sig.DEFANGED-14>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20230330/4e4f15e9/attachment.asc>


More information about the File mailing list