[File] [PATCH] Magdir/archive Apple DiskCopy 4.2 image misidentifies biosmd80.rom

Jörg Jenderek joerg.jen.der.ek at gmx.net
Wed Oct 20 12:08:58 UTC 2021


Hello,

some times ago i handled some computer ROM examples. Some unexpected
results lead me to inspection of Apple DiskCopy 4.2 with
file name extension IMAGE or DC42.

When running running file command version 5.41 on such images and
related files i get an output like:

biosmd80.rom:                    Apple DiskCopy 4.2 image
				 \003ACFG\001\002,
				 1114112 bytes,
				 0x60 tag size, 0x1c encoding,
				 0x0 format
Disco 12.image:                  Apple DiskCopy 4.2 image
            			 Disco 12,
				 1474560 bytes,
				 MFM CAV dshd (1440k),
				 0x22 format
Disquette Installation 13.image: Apple DiskCopy 4.2 image
	  	       		 Disquette Installation 13,
				 1474560 bytes,
				 MFM CAV dshd (1440k),
				 0x2 format
IIe Installer Disk.image:        Apple DiskCopy 4.2 image
          	      		 IIe Installer Disk,
				 1474560 bytes,
				 MFM CAV dshd (1440k),
				 0x22 format
LISA CALENDAR (Master).image:    Apple DiskCopy 4.2 image
           	      		 -not a Macintosh disk-,
				 409600 bytes,
				 0x2580 tag size,
				 GCR CLV ssdd (400k),
				 0x2 format
Microsoft Mail.image:            Apple DiskCopy 4.2 image
	  			 Microsoft Mail,
				 819200 bytes,
				 0x4b00 tag size,
				 GCR CLV dsdd (800k),
				 0x22 format
Utilitaires 2.img:               Apple DiskCopy 4.2 image
	    			 Utilitaires 2,
				 1474560 bytes,
				 MFM CAV dshd (1440k),
				 0x22 format


For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). This does not
misidentifies examples biosmd80.rom (See appended trid-v-dc42.txt.gz).

I also run the identifying tool DROID ( See
https://sourceforge.net/projects/droid ). It identifies the disc
images as "Apple Disk Copy Image" version 4.2 by PUID fmt/625 and
also skip misidentified ROM image (See appended DROID-dc42.csv)

Unfortunately the used documentation URL has become invalid because
the web site wiki.68kmla.org seems to be vanished. So i replaced this
with a similar link to Apple's Disk Copy page on Wikipedia. That i
expressed by comment line like:
# URL:		https://en.wikipedia.org/wiki/Disk_Copy

Unfortunately Apple DiskCopy files have only a weak 2 byte magic
pattern 0x0100 at offset 0x52. Luckily the displaying part is done by
sub routine dc42-floppy. So only tests for characteristic must be
changed. This sub routine start like

  0	name		dc42-floppy
  >00	pstring/B	x	Apple DiskCopy 4.2 image %s
  !:mime	application/x-dc42-floppy-image
  !:apple dCpydImg
  !:ext	image/dc42

Apparently the disc image name is ASCII like without control
characters like "\003ACFG\001\002" in misidentified biosmd80.rom. So
this can be used as additional test, but i do not do this. Before the
string the length of the string is stored as byte value. This can be
shown by additional DEBUG line like:
  0	name		dc42-floppy
  >0	ubyte	    	x	DISK NAME LENGTH %u

According to documentation instead of 225 the maximal string length
is 63. For some non disk images i get here invalid values like:
	181	(biosmd80.rom)
	202	(Flags$StringJoiner.class)
	90	(UNICODE.DAT)

So i skipped such misidentified examples by additional test line for
valid disk name length which now becomes like:
  >>>>0x0	ubyte	    	<64
  >>>>>0	use	dc42-floppy

Later in sub routine the original disk size is shown by line like
  >0x40	ubelong		x	\b, %u bytes
For debugging purpose this size can be also shown in hexadecimal by
line like:
  >0x40	ubelong		x	(%#8.8x)

So typical values in decimal are 409600 737280 819200 1474560. When
expressing these values in hexadecimal we see that DROID and TrID
explicitly check for these size values. For TrID this can be
summarized in table form like:
	# 00064000 for  400k GCR disks	dc42-400k-gcr.trid.xml
	# 000c8000 for  800k GCR disks	dc42-800k-gcr.trid.xml
	# 000b4000 for  720k MFM disks	dc42-720k-mfm.trid.xml
	# 00168000 for 1440k MFM disks	dc42-1440k-mfm.trid.xml

The test for valid disk sizes was done by 3 magic lines like:
  >0x40	ubelong		>409599
  >>0x40	ubelong		<1474561
  >>>0x40	ubelong&0xffE03fFF	0
First i test for low limit (400K), then for high limit 1440k. By
these steps misidentified examples like windows7en.mbr (B441BBAA)
UNICODE.DAT (0400AF05) are skipped. The sizes are a multiple of block
size (like 512) and does not reach theoretical upper 4 GiB limit. So
in hexadecimal expression some upper and lower bits are nil. That was
used by last AND-masked expression to skip example like
Flags$StringJoiner.class with invalid size value 00106A61h.

The example "LISA CALENDAR (Master).image" was also identified by
TrID as "DiskCopy 4.2 non-Mac disk image" by TrID definition
dc42-nonmac.trid.xml. When i look inside that definition i see that
beside the check for weak magic only a check for disk name "-not a
Macintosh disk" is done as additional test. So here no test for disk
size is done. Then there exist a sixth definition
dc42-lisaem.trid.xml. There also a check for weak magic is done and
as second test a check for disk name "-lisaem.sunder.net hd-" is done.

When i look on that mentioned web site https://lisaem.sunder.net/ i
see that this is the site for a Lisa Emulator Project. On that site
exist a Project Documentation for Developers with name
LisaProjectDocs.txt.
There an interesting text about "large floppies" is written:
It is also possible that we may be able to build a "floppy" image up
to 24mb, but this is uncertain at this time and highly dependent on
the OS. Certainly, the profile drives can easily go as
high as 32mb, possibly 64mb though only 5mb and 10mb models were
available.
If there can exist such "large floppies", then there probably can
also exist "large" DC42 disc images. So the used test lines for
checking disc size are too strict and become now like:
  >0x40		ubelong			>409599
  >>0x40		ubelong			<0x04000001
  >>>0x40	ubelong&0xf8003fFF	0

The tag size is shown by line like:
  >0x44	ubelong		>0	\b, %#x tag size
The DROID tool also check for existence of 3 tag size values {like
0 (often) 2580h (PUID fmt/625) 4B00h (Microsoft Mail.image)}.
But in DROID text is written that there is a possibility there are
additional values for these sequences which may require future
adjustment to the signature. So this value is not well suited as
additional test.

The disc encoding format is stored inside byte at offset 80. That is
done by lines like:
  >0x50	ubyte		0	\b, GCR CLV ssdd (400k)
  >0x50	ubyte		1	\b, GCR CLV dsdd (800k)
  >0x50	ubyte		2	\b, MFM CAV dsdd (720k)
  >0x50	ubyte		3	\b, MFM CAV dshd (1440k)
  >0x50	ubyte		>3	\b, %#x encoding
because in some documentation is written that other encodings may
exist, as DC42 was originally designed to be able to image HD20
disks. But DROID explicitly checks only for the 4 values of "real"
floppies.

The disc format is stored inside byte at offset 81. That is done by
lines like:
  >0x51	ubyte		x	\b, %#x format
The bit interpretation is complicated, but obviously not all 255
combinations exist. The DROID tool explicitly checks for 5 possible
values like:
	12h (400K)
	24h (400K Macintosh)
	96h (800K Apple II disk)
	02h "Disquette Installation 13.image"
	22h "Disco 12.image" "IIe Installer Disk.image"
but in my inspected examples i only found the values 2h and 22h.

Only 2 file name extensions (image dc42) were mentioned, but i also
found examples like "Utilitaires 2.img" or "Installation 7.img" where
the 3 byte extension "img" is used. So this is now expressed by line
like:
!:ext	image/dc42/img

After applying the above mentioned modifications by patch
file-5.41-archive-dc42.diff then now all inspected Apple Disk images
are still described and misidentification of biosmd80.rom vanish.
This now looks like:

biosmd80.rom:                    data
Disco 12.image:                  Apple DiskCopy 4.2 image
        				 Disco 12,
				 1474560 bytes,
				 MFM CAV dshd (1440k),
				 0x22 format
Disquette Installation 13.image: Apple DiskCopy 4.2 image
	  	       		 Disquette Installation 13,
				 1474560 bytes,
				 MFM CAV dshd (1440k),
				 0x2 format
Flags$StringJoiner.class:        data
IIe Installer Disk.image:        Apple DiskCopy 4.2 image
      	      			 IIe Installer Disk,
				 1474560 bytes,
				 MFM CAV dshd (1440k),
				 0x22 format
LISA CALENDAR (Master).image:    Apple DiskCopy 4.2 image
       	      			 -not a Macintosh disk-,
				 409600 bytes,
				 0x2580 tag size,
				 GCR CLV ssdd (400k),
				 0x2 format
Microsoft Mail.image:            Apple DiskCopy 4.2 image
	  			 Microsoft Mail,
				 819200 bytes,
				 0x4b00 tag size,
				 GCR CLV dsdd (800k),
				 0x22 format
Utilitaires 2.img:               Apple DiskCopy 4.2 image
	    			 Utilitaires 2,
				 1474560 bytes,
				 MFM CAV dshd (1440k),
				 0x22 format


I hope my diff file can be applied in future version of file utility.

With best wishes
Jörg Jenderek
--
Jörg Jenderek






















































-------------- next part --------------
--- file-5.41/magic/Magdir/archive.old	2021-08-30 09:10:26 +0000
+++ file-5.41/magic/Magdir/archive	2021-10-20 11:51:32 +0000
@@ -520,3 +520,3 @@
 # From:		Joerg Jenderek
-# URL:		https://wiki.68kmla.org/DiskCopy_4.2_format_specification
+# URL:		https://en.wikipedia.org/wiki/Disk_Copy
 # reference:	http://nulib.com/library/FTN.e00005.htm
@@ -527,13 +527,29 @@
 # windows7en.mbr UNICODE.DAT
->>0x40	ubelong		<1474561
-# To skip Flags$StringJoiner.class with size 00106A61h test also for only 4 disk image sizes
-# 00064000 for  400k GCR disks
-# 000c8000 for  800k GCR disks
-# 000b4000 for  720k MFM disks
-# 00168000 for 1440k MFM disks
->>>0x40	ubelong&0xffE03fFF	0
->>>>0	use	dc42-floppy
+#>>0x40	ubelong		<1474561
+# test now for "low" disk image size equal or below 64 MiB to skip
+# windows7en.mbr (B441BBAAh) UNICODE.DAT (0400AF05h)
+>>0x40	ubelong		<0x04000001
+# To skip Flags$StringJoiner.class with size 00106A61h test also for valid disk image sizes
+# 00064000 for  400k GCR disks	dc42-400k-gcr.trid.xml
+# 000c8000 for  800k GCR disks	dc42-800k-gcr.trid.xml
+# 000b4000 for  720k MFM disks	dc42-720k-mfm.trid.xml
+# 00168000 for 1440k MFM disks	dc42-1440k-mfm.trid.xml
+#	https://lisaem.sunder.net/LisaProjectDocs.txt
+# 00500000	05M	available
+# 00A00000	10M	available
+# 01800000	24M	possible
+# 02000000	32M	uncertain
+# 04000000	64M	uncertain
+>>>0x40	ubelong&0xf8003fFF	0
+# skip samples with invalid disk name length like:
+# 181 (biosmd80.rom) 202 (Flags$StringJoiner.class) 90 (UNICODE.DAT)
+>>>>0x0	ubyte			<64
+>>>>>0	use			dc42-floppy
 #	display information of Apple DiskCopy 4.2 floppy image
 0	name		dc42-floppy
-# image pascal name padded with NULs like Microsoft Mail
+# disk name length; maximal 63
+#>0	ubyte	    	x	DISK NAME LENGTH %u
+# ASCII image pascal (maximal 63 bytes) name padded with NULs like:
+# "Microsoft Mail" "Disquette 2" "IIe Installer Disk"
+# "-lisaem.sunder.net hd-" (dc42-lisaem.trid.xml) "-not a Macintosh disk" (dc42-nonmac.trid.xml)
 >00	pstring/B	x	Apple DiskCopy 4.2 image %s
@@ -542,4 +558,5 @@
 !:apple	dCpydImg
-!:ext	image/dc42
-# data size in bytes like 409600
+# probably also img like: "Utilitaires 2.img" "Installation 7.img"
+!:ext	image/dc42/img
+# data size in bytes like: 409600 737280 819200 1474560
 >0x40	ubelong		x	\b, %u bytes
@@ -547,3 +564,3 @@
 #>0x40	ubelong		x	(%#8.8x)
-# tag size in bytes
+# tag size in bytes like: 0 (often) 2580h (PUID fmt/625) 4B00h (Microsoft Mail.image)
 >0x44	ubelong		>0	\b, %#x tag size
@@ -553,3 +570,3 @@
 #>0x4c	ubelong		x	\b, %#x tag checksum
-# disk encoding
+# disk encoding like: 0 1 2 3 (PUID: fmt/625)
 >0x50	ubyte		0	\b, GCR CLV ssdd (400k)
@@ -559,3 +576,5 @@
 >0x50	ubyte		>3	\b, %#x encoding
-# format byte
+# format byte like: 12h (Lisa 400K) 24h (400K Macintosh) 96h (800K Apple II disk)
+# 2 (Mac 400k "Disquette Installation 13.image")
+# 22h (double-sided MFM or Mac 800k "Disco 12.image" "IIe Installer Disk.image")
 >0x51	ubyte		x	\b, %#x format
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.41-archive-dc42.diff.sig
Type: application/octet-stream
Size: 1517 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20211020/837f3df0/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-dc42.txt.gz
Type: application/x-gzip
Size: 539 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20211020/837f3df0/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DROID-dc42.csv.gz
Type: application/x-gzip
Size: 569 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20211020/837f3df0/attachment-0001.bin>


More information about the File mailing list