[File] [PATCH] of Magdir/virtual for Microsoft Disk Image eXtended (*.vhdx)

Jörg Jenderek joerg.jen.der.ek at gmx.net
Sat Dec 15 17:00:12 UTC 2018


Hello,

some days ago i run the file command 5.35 on my disc images. Disc images
with name extension vhdx are described only as "data".

VHDX (Virtual Hard Disk v2) is the successor format to VHD. So i add new
lines in Magdir/virtual after VHD entry. On microsoft servers
information about that file format can be found. So i add reference URL
to document [MS-VHDX].pdf with date 9/12/2018 and with title "Virtual
Hard Disk v2 (VHDX) File Format".

According to Microsoft documentation such images start with the
VHDX_FILE_IDENTIFIER signature 0x656C696678646876. This becomes first
magic line:
 0	string			vhdxfile
To distinguish disc image from ASCII text starting with phrase
"vhdxfile" i look for more characteristics. According to docs header
part is stored at offset 64KB and at 128KB. These start with VHDX_HEADER
signature "head". This now used as
 >0x10000	string		head	Microsoft Disk Image eXtended

Afterwards the name of the creator of the VHDX file is stored as
UTF-16. So names like "QEMU v3.0.0", "Microsoft Windows 6.3.9600.18512"
are also shown by line
 >>8		lestring16		x	\b, by %.256s

Second field in header part is a CRC-32C hash over the entire 4 KB
structure. So i add line like
 #>>0x10004	ulelong			x	\b, CRC 0x%x
Because this information is not so useful for "normal" users i add this
as a comment line. I also handle other field in the same way.

Luckily now newer file versions support quad pointers. So it is possible
to peek inside the Log Entry section. To check for the existence of log
entry look for signature value (0x65676F6C~loge) by lines:
 >>(0x10048.q)	ulelong			!0x65676F6C \b, NO Log Signature
 >>(0x10048.q)	ulelong			=0x65676F6C \b; LOG

I tried to get the value for the virtual size of image, which is also
shown by command like `qemu-img info`. This is stored as VirtualDiskSize
by Virtual Disk Size GUID at different offsets. Unfortunately i am not
smart enough to program this part. From region section get info to jump
to metadata section. I can do this step. Then in metadata Table look for
the entry with wanted GUID. Yes i can do this. Now get value by looking
at offset relative to beginning of metadata. There i fail. May be
another person is clever to do this.
So i keep first steps starting with displayed phrase "region". But this
information is normally not of interest for users.

After applying the above mentioned modifications by patch
file-5.35-virtual-vhdx.diff then all such disk images are described by
Magdir/virtual like:

Esp.vhdx:
	Microsoft Disk Image eXtended,
	by Microsoft Windows 6.3.9600.18512, sequence 0x14;
	LOG; region, 2 entries,
	id BAT, at 0x300000, Required 1,
	id Metadata, at 0x200000, Required 1
qemu16MB-dynamic.vhdx:
	Microsoft Disk Image eXtended,
	by QEMU v3.0.0, sequence 0x1993d0b2,
	NO Log Signature; region, 2 entries,
	id BAT, at 0x200000, Required 0,
	id Metadata, at 0x300000, Required 0
qemu16MB-fixed.vhdx:
	Microsoft Disk Image eXtended,
	by QEMU v3.0.0, sequence 0x9897bc4c,
	NO Log Signature; region, 2 entries,
	id BAT, at 0x200000, Required 0,
	id Metadata, at 0x300000, Required 0
qemu24MB-logsize2M.vhdx:
	Microsoft Disk Image eXtended,
	by QEMU v3.0.0, sequence 0x675f22b0,
	LogLength 2 MB, NO Log Signature; region, 2 entries,
	id BAT, at 0x300000, Required 0,
	id Metadata, at 0x400000, Required 0
\VM\FISCH_C.VHDX:
	Microsoft Disk Image eXtended,
	by d2v, sequence 0xa;
	LOG; region, 2 entries,
	id Metadata, at 0x200000, Required 1,
	id BAT, at 0x300000, Required 1
\temp\vhdx24mb.vhdx:
	Microsoft Disk Image eXtended,
	by Microsoft Windows 6.3.9600.18512, sequence 0x8;
	LOG; region, 2 entries,
	id BAT, at 0x300000, Required 1,
	id Metadata, at 0x200000, Required 1

I hope my diff file can be applied in future version of
file utility.

With best wishes
Jörg Jenderek
-- 
Jörg Jenderek




-------------- next part --------------
--- file-5.35/magic/Magdir/virtual.old	2017-03-17 21:34:26 +0000
+++ file-5.35/magic/Magdir/virtual	2018-12-15 16:04:46 +0000
@@ -11,2 +11,124 @@
 
+# From: Joerg Jenderek
+# URL: https://msdn.microsoft.com/en-us/library/mt740058.aspx
+# Reference: https://winprotocoldoc.blob.core.windows.net/productionwindowsarchives/
+# MS-VHDX/[MS-VHDX].pdf
+# Note: extends the VHD format with new capabilities, such as a 16TB maximum size
+# TODO:	find and display values like virtual size, disk size, cluster_size, etc
+#	display id in GUID format
+#
+# VHDX_FILE_IDENTIFIER signature 0x656C696678646876
+0	string			vhdxfile
+# VHDX_HEADER signature. 1 header is stored at offset 64KB and the other at 128KB
+>0x10000	string		head		Microsoft Disk Image eXtended
+#>0x20000	string			head	\b, 2nd header
+#!:mime	application/x-virtualbox-vhdx
+!:ext	vhdx
+# Creator[256] like "QEMU v3.0.0", "Microsoft Windows 6.3.9600.18512"
+>>8		lestring16		x	\b, by %.256s
+# The Checksum field is a CRC-32C hash over the entire 4 KB structure
+#>>0x10004	ulelong			x	\b, CRC 0x%x
+# SequenceNumber
+>>0x10008	ulequad			x	\b, sequence 0x%llx
+# FileWriteGuid
+#>>0x10010	ubequad			x	\b, file id 0x%llx
+#>>>0x10018	ubequad			x	\b-%llx
+# DataWriteGuid
+#>>0x10020	ubequad			x	\b, data id 0x%llx
+#>>>0x10028	ubequad			x	\b-%llx
+# LogGuid. If this field is zero, then the log is empty or has no valid entries 
+>>0x10030	ubequad			>0	\b, log id 0x%llx
+>>>0x10038	ubequad			x	\b-%llx
+# LogVersion. If not 0 there is a log to replay
+>>0x10040	uleshort		>0	\b, LogVersion 0x%x
+# Version. This field must be set to 1
+>>0x10042	uleshort		!1	\b, Version 0x%x
+# LogLength must be multiples of 1 MB
+>>0x10044	ulelong/1048576		>1	\b, LogLength %u MB
+# LogOffset (normally 0x100000 when log direct after header); multiples of 1 MB
+>>0x10048	ulequad			!0x100000 \b, LogOffset 0x%llx
+# Log Entry Signature must be 0x65676F6C~loge
+>>(0x10048.q)	ulelong			!0x65676F6C \b, NO Log Signature
+>>(0x10048.q)	ulelong			=0x65676F6C	\b; LOG
+# Log Entry Checksum
+#>>>(0x10048.q+4)	ulelong		x	\b, Log CRC 0x%x
+# Log Entry Length must be a multiple of 4 KB
+>>>(0x10048.q+8)	ulelong/1024	>4	\b, EntryLength %u KB
+# Log Entry Tail must be a multiple of 4 KB
+#>>>(0x10048.q+12)	ulelong		x	\b, Tail 0x%x
+# Log Entry SequenceNumber
+#>>>(0x10048.q+16)	ulequad		x	\b, # 0x%llx
+# Log Entry DescriptorCount may be zero. only 4 bytes in other docs instead 8
+#>>>(0x10048.q+24)	ulelong		x	\b, DescriptorCount 0x%llx
+# Log Entry Reserved must be set to 0
+>>>(0x10048.q+28)	ulelong		!0	\b, Reserved 0x%x
+# Log Entry LogGuid
+#>>>(0x10048.q+32)	ubequad		x	\b, Log id 0x%llx
+#>>>(0x10048.q+40)	ubequad		x	\b-%llx
+# Log Entry FlushedFileOffset should VHDX size when entry is written.
+#>>>(0x10048.q+48)	ulequad		x	\b, FlushedFileOffset %llu
+# Log Entry LastFileOffset
+#>>>(0x10048.q+56)	ulequad		x	\b, LastFileOffset %llu
+# filling
+#>>>(0x10048.q+64)	ulequad		>0	\b, filling %llx
+# Reserved[4016]
+#>>0x10050	ulequad			>0	\b, Reserved 0x%llx
+# VHDX_REGION_TABLE_HEADER Signature 0x69676572~regi at offset 192 KB and 256 KB
+>0x30000	ulelong			!0x69676572 \b, 1st region INVALID
+>0x30000	ulelong			=0x69676572 \b; region
+# region Checksum. CRC-32C hash over the entire 64-KB table
+#>>0x30004	ulelong			x	\b, CRC 0x%x
+# The EntryCount specifies number of valid entries; Found 2; This must be =< 2047. 
+>>0x30008	ulelong			x	\b, %u entries
+# reserved must be zero
+#>>0x3000C	ulelong			!0	\b, RESERVED 0x%x
+# Region Table Entry starts with identifier for the object. often BAT id
+>>0x30010	use			vhdx-id
+# FileOffset
+>>0x30020	ulequad		x		\b, at 0x%llx
+# Length. Specifies the length of the object within the file
+#>>0x30028	ulelong		x		\b, Length 0x%x
+# 1 means region entry is required. if region not recognized, then REFUSE to load VHDX
+>>0x3002C	ulelong		x		\b, Required %u
+# 2nd region entry often metadata id
+>>0x30030	use			vhdx-id
+# 2nd entry FileOffset
+>>0x30040	ulequad		x		\b, at 0x%llx
+# 1 means region entry is required. if region not recognized, then REFUSE to load VHDX
+>>0x3004C	ulelong		x		\b, Required %u
+# 2nd region
+>>0x40000	ulelong		!0x69676572	\b, 2nd region INVALID
+# check in vhdx images for known id and show names instead hexadecimal
+0	name		vhdx-id
+# http://www.windowstricks.in/online-windows-guid-converter
+# 2DC27766-F623-4200-9D64-115E9BFD4A08		BAT GUID
+# 6677C22D23F600429D64115E9BFD4A08		BAT ID
+>0	ubequad		=0x6677C22D23F60042
+>>8	ubequad		=0x9D64115E9BFD4A08	\b, id BAT
+# no BAT id
+>>8	default		x
+>>>0	use		vhdx-id-hex
+# 8B7CA206-4790-4B9A-B8FE-575F050F886E		Metadata region GUID
+# 06A27C8B90479A4BB8FE575F050F886E		Metadata region ID
+>0	ubequad		=0x06A27C8B90479A4B
+>>8	ubequad		=0xB8FE575F050F886E	\b, id Metadata
+# no Metadata id
+>>8	default		x
+>>>0	use		vhdx-id-hex
+# 2FA54224-CD1B-4876-B211-5DBED83BF4B8		Virtual Disk Size GUID
+# 2442A52F1BCD7648B2115DBED83BF4B8		Virtual Disk Size ID
+# value "virtual size" can be verified by command `qemu-img info `
+>0	ubequad		=0x2442A52F1BCD7648
+>>8	ubequad		=0xB2115DBED83BF4B8	\b, id vsize
+# no Virtual Disk Size ID
+>>8	default		x
+>>>0	use		vhdx-id-hex
+# other ids
+>0	default		x
+>>0	use		vhdx-id-hex
+# in vhdx images show id as hexadecimal
+0	name		vhdx-id-hex
+>0	ubequad		x			\b, ID 0x%16.16llx
+>8	ubequad		x			\b-%16.16llx
+#
 # libvirt


More information about the File mailing list