[File] [PATCH] of Magdir/archive *.OVA described as POSIX tar archive

Jörg Jenderek joerg.jen.der.ek at gmx.net
Mon May 1 20:01:16 UTC 2023


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,
some days weeks i must migrate to Windows 10. During that process i
lost some Virtual Box machines. So i look for file formats related to
Virtual Box. One format use filename extension ova.

When running file command version 5.44 on such examples, i get an
output like:
DOS-0.9.ova:    POSIX tar archive
FreeDOS_1.ova:  POSIX tar archive
Win98SE_DE.ova: POSIX tar archive

When excluding internal tar checks by "-e tar" option i get same
description with additional information. That looks like:

DOS-0.9.ova:    POSIX tar archive, file DOS-0.9.ovf
		, mode 0100640, uid 0000007, gid 0000000
		, size 00000020207, seconds 14423047516
		, user vboxovf09, group vbox_v7.0.6r155176
FreeDOS_1.ova:  POSIX tar archive, file FreeDOS_1.ovf
		, mode 0100640, uid 0000007, gid 0000000
		, size 00000023702, seconds 14423046655
		, user vboxovf10, group vbox_v7.0.6r155176
Win98SE_DE.ova: POSIX tar archive, file Win98SE_DE.ovf
		, mode 0100640, uid 0000007, gid 0000000
		, size 00000025700, seconds 14422473537
		, user vboxovf09, group vbox_v7.0.6r155176

With --extension option only tar/ustar is displayed. Furthermore with
- -i option for samples only generic application/x-ustar or
application/x-tar is shown.

For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/). Here
often the program freeze. The samples are described as "Tape Archive
Format" with mime type application/x-tar by by PUID x-fmt/265. The
OVA suffix is considered as bad.

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). Some of my
Virtualization packages are described as correctly as "Open
Virtualization Format package" with generic mime type
application/octet-stream by ova.trid.xml. All of my OVA samples are
described with low priority as "TAR - Tape ARchive (POSIX)" with mime
type application/x-ustar by ark-tar-posix.trid.xml (See appended
output/trid-v-ova.txt.gz).

TrID list the used file name extension and often with -v option the
related URL pointing to used file format information. This is now
expressed inside Magdir/archive by additional comment lines like:
# URL:		https://en.wikipedia.org/wiki/
#		Open_Virtualization_Format
#		http://fileformats.archiveteam.org/wiki/
#		OVF_(Open_Virtualization_Format)
# Reference:	http://mark0.net/download/triddefs_xml.7z
#		defs/o/ova.trid.xml

According to documentation OVA are just tar files with OVF descriptor
(a file with extension .ovf). That can be verified by unpacking
listing (see appended 7z-l.txt.gz) like done by command like:
	7z l -ttar *.ova

Unfortunately the specification are not so preciously. It is not
written if ovf file is already first archive member. I also found no
description if tar is always POSIX variant. Furthermore there exist
no official registration at iana.org for example. So there is no
guarantee that trid definition and magic(5) lines works with 100%.
So computer companies and organizations should do first such basic
stuff instead of wasting times and resources inside KI field. So
windows still relies on stupid system that file types based on file
name suffix. That is bad!
Because there exist no duty to register file types, collisions can
happen like db suffix is used for dozen of different database
systems. In old Mac there exist such a system, but Apple do not use
it any more. And a better system is not introduced on these
systems. That this is worse can be seen by looking at first
computer aids
virus. This just changes also the file name suffix. So the files can
not be opened any more on DOS/Windows and you can not work with
these systems any more. Also the politician like European union fail
to work and are not using priorities. So they are able to forbid or
restrict the usage of light bulbs and vacuum cleaners. But computer
companies can put and get data parts on every PC in the whole world
without exact specification and rules. That is ridiculous.

Assuming that ovf file always comes first in TAR archive i can
change magic lines inside Magdir/archive. There after some test
lines the displaying part is done by subroutine tar-cbt for Comic
Book archive packed as tar or tar-file for other cases. So i must
only insert test lines for OVA samples by check that first archive
member name[100] is a file name with ovf suffix. So this part now
becomes like:
 >>>>>>>>0	regex \^[0-9]{2,4}[.](png|jpg|jpeg|tif|tiff|gif|bmp)
 >>>>>>>>>0	use	tar-cbt
 >>>>>>>>0	regex	\^.{1,96}[.](ovf)
 >>>>>>>>>0	use	tar-ova
 >>>>>>>>0	default		x
 >>>>>>>>>0	use	tar-file

The displaying new sub routine is named tar-ova. The OVA samples
are just tar files. So a generic mime type application/x-tar in
principal is OK. On my Windows system OVA samples are associated
with user defined application/x-virtualbox-ova. That information
can also be found on extension.nirsoft.net. So this sub routine
looks like:
 0	name		tar-ova
 >0	string		x	Open Virtualization Format Archive
 !:mime	application/x-virtualbox-ova
 !:ext	ova
 >0	string		>\0	\b, with %-.60s

After applying the above mentioned modifications by patch
file-5.44-archive-ova.diff then i get a more correct output like:
DOS-0.9.ova:    Open Virtualization Format Archive
		, with DOS-0.9.ovf
FreeDOS_1.ova:  Open Virtualization Format Archive
		, with FreeDOS_1.ovf
Win98SE_DE.ova: Open Virtualization Format Archive
		, with Win98SE_DE.ovf

I hope my diff file can be applied in future version of
file utility.

With best wishes
Jörg Jenderek
- --
Jörg Jenderek
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCZFAajAAKCRCv8rHJQhrU
1jKJAJ9YypneFsTmduPGFdjJc646A33vwwCggwsRZ1cdHmQ3cB6114lJ/BniIsU=
=KZzd
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-ova.txt.gz
Type: application/x-gzip
Size: 816 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230501/15e70d06/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 7z-l.txt.gz
Type: application/x-gzip
Size: 635 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230501/15e70d06/attachment-0001.bin>
-------------- next part --------------
--- file-5.44/magic/Magdir/archive.old	2022-12-26 19:00:47.000000000 +0100
+++ file-5.44/magic/Magdir/archive	2023-05-01 21:39:03.148584900 +0200
@@ -26,17 +26,19 @@
 # space or ascii digit 0 at start of check sum
 >>>>>>>148	ubyte&0xEF	=0x20	
 # FOR DEBUGGING: 
 #>>>>>>>>0	regex		\^[0-9]{2,4}[.](png|jpg|jpeg|tif|tiff|gif|bmp)	NAME "%s"
 # check for 1st image main name with digits used for sorting
 # and for name extension case insensitive like: PNG JPG JPEG TIF TIFF GIF BMP
 >>>>>>>>0	regex		\^[0-9]{2,4}[.](png|jpg|jpeg|tif|tiff|gif|bmp)
-#foo
 >>>>>>>>>0	use	tar-cbt
-# if 1st member name without digits and without used image suffix then it is a TAR archive
+# check for 1st member name with ovf suffix
+>>>>>>>>0	regex		\^.{1,96}[.](ovf)
+>>>>>>>>>0	use	tar-ova
+# if 1st member name without digits and without used image suffix and without *.ovf then it is a TAR archive
 >>>>>>>>0	default		x
 >>>>>>>>>0	use	tar-file
 #	minimal check and then display tar archive information which can also be
 #	embedded inside others like Android Backup, Clam AntiVirus database
 0	name		tar-file
 >257	string		!ustar		
 # header padded with nuls
@@ -164,14 +166,29 @@
 #!:mime	application/x-tar
 !:mime	application/vnd.comicbook
 #!:mime	application/vnd.comicbook+tar
 !:ext	cbt
 # name[100] probably like: 19.jpg 0001.png 0002.png
 # or maybe like ComicInfo.xml
 >0	string		>\0		\b, 1st image %-.60s
+# Summary:	Open Virtualization Format *.OVF with disk images and more packed as TAR archive *.OVA
+# From:		Joerg Jenderek
+# URL:		https://en.wikipedia.org/wiki/Open_Virtualization_Format
+#		http://fileformats.archiveteam.org/wiki/OVF_(Open_Virtualization_Format)
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/o/ova.trid.xml
+# Note:		called "Open Virtualization Format package" by TrID
+#		assuming *.ovf comes first
+0	name		tar-ova
+>0	string		x		Open Virtualization Format Archive
+#!:mime	application/x-ustar
+# http://extension.nirsoft.net/ova
+!:mime	application/x-virtualbox-ova
+!:ext	ova
+# assuming name[100] like: DOS-0.9.ovf FreeDOS_1.ovf Win98SE_DE.ovf
+>0	string		>\0		\b, with %-.60s
 
 # Incremental snapshot gnu-tar format from:
 # https://www.gnu.org/software/tar/manual/html_node/Snapshot-Files.html
 0	string		GNU\ tar-	GNU tar incremental snapshot data
 >&0	regex		[0-9]\\.[0-9]+-[0-9]+	version %s
 
 # cpio archives
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.44-archive-ova.diff.sig
Type: application/octet-stream
Size: 1329 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230501/15e70d06/attachment.obj>


More information about the File mailing list