[File] [PATCH] Magdir/c64, mmdf, images, pdp collision: D64 Image, MMDF mailbox, Targa image, PDP-11 UNIX/RT ldp

Christos Zoulas christos at zoulas.com
Thu Feb 29 03:41:03 UTC 2024


Committed, thanks. Wow nice analysis!

christos

> On Feb 18, 2024, at 8:26 PM, Jörg Jenderek (GMX) <joerg.jen.der.ek at gmx.net> wrote:
> 
> Hello,
> 
> some days ago i read an article about emulation of old commodore
> computer. In that context disc image with D64 file name extension are
> mentioned. When running file command version 5.45 with -k option on such
> examples and "related" files i get an output like:
> 
> Cabal+2-MarioSoft(1).d64:       Targa image data
> 				- Map (257-257)
> 				257 x 257 x 1 +257 +257
> 				- 1-bit alpha "\001\001\001\00
> 				D64 Image
> DolphinDosErrorBytes.d64:       D64 Image
> DragonNinja+4DCS+ATC+Pearl.d64: D71 Image
> Kokotoni Wilf (ANTI-ROM).d64:   Targa image data
> 	      			- Map (257-257)
> 				257 x 257 x 1 +257 +257
> 				- 1-bit alpha "\001\001\001\001
> 				D64 Image
> M1571-D71.d71:                  Targa image data
> 				- Map (257-257)
> 				257 x 257 x 1 +257 +257
> 				- 1-bit alpha "\001\001\001\001
> 				D71 Image
> My-8250-D82.d82:                data
> The Great Gianna Sisters.d64:   Targa image data
>    	  	 		- Map (257-257)
> 				257 x 257 x 1 +257 +257
> 				- 1-bit alpha "\001\001\001\001
> 				D64 Image
> 				MMDF mailbox
> 				PDP-11 UNIX/RT ldp
> TheGreatGiannaSisters.d81:      D81 Image
> bmpsuite-216col1st-98h.tga:     Targa image data
> 				- Map (152-217)
> 				- RLE
> 				1024 x 768 x 8 +768 - top
> brucelee.d64:                   D64 Image
> 				VAX-order 68k Blit mpx/mux executable
> 				TTComp archive data
> 				, ASCII, 4K dictionary
> cbmcmd23.d80:                   data
> elektrix.d64:                   D64 Image
> file-speeddos.d64:              D64 Image
> input.tga:                      Targa image data
> 				- RGB
> 				70 x 46 x 24
> lastninja12.d81:                D81 Image
> ls209.d81:                      D81 Image
> maillog.expected.2:             MMDF mailbox
> 				PDP-11 UNIX/RT ldp
> sim&rct.d64:                data
> test-mbox.d80:                  data
> test-mmdf.d71:                  D71 Image
> test-pdp.bin:                   PDP-11 UNIX/RT ldp
> ucm8.tga:                       Targa image data
> 				- Map (256)
> 				128 x 128 x 8
> 				"Truevision(R) Sample Image"
> 				- author "Ricky True"
> 				- comment "Sample 8 bit uncompressed
> 				color mapped image" 24-2-1990 10:00:00
> 				- job "TGA Utilities"
> 				- TGAEdit 1.40
> uupc.input.1:                   Targa image data
> 				- Map (257-257)
> 				257 x 257 x 1 +257 +257
> 				- 1-bit alpha "\001\001"
> 				MMDF mailbox
> 				PDP-11 UNIX/RT ldp
> 
> Furthermore for samples not described as Targa image only generic mime
> type application/octet-stream is shown with -i option. With --extension
> option only 3 byte sequence ??? is shown for such samples.
> 
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/). The
> TGA graphic images are described as "Truevision TGA Bitmap" with version
> 1.0 without mime type by PUID x-fmt/367 or with version 2.0 by PUID
> fmt/402. The sample maillog.expected.2 is described as "MIME Email" with
> version 1.0 and mime type message/rfc822 by PUID fmt/950. Most of the
> Commodore disc images are described as "null bytes" by
> null_bytes.trid.xml because often the images contains no content at the
> beginning (See appended trid-v-d64.txt.gz).
> 
> The description as mailbox happens inside Magdir/mmdf by line like:
> 0	string	\001\001\001\001	MMDF mailbox
> 
> Luckily i found a page about MMDF (Multi-channel Memorandum Distribution
> Facility) on Wikipedia. The file format is also described in section
> five of manual user pages {See mmdf(5)}. This manual is part of packages
> from some user manuals or mutt mail user agent or tin news reader. That
> informations are now expressed by additional comment lines inside
> Magdir/mmdf like:
> # URL:		https://en.wikipedia.org/wiki/MMDF
> # Reference:	https://docs.oracle.com/cd/E88353_01/html/E37852/
> #		mmdf-5.html
> 
> According to documentation MMDF (Multi-channel Memorandum Distribution
> Facility) mailbox format is a "legacy" variant of mbox format. Each
> message is surrounded by lines containing 4 control-A. Unfortunately the
> check for these four control characters (magic strength=70) is not
> unique enough. These matches a few D64 Images like "The Great Gianna
> Sisters.d64" (strength=70) initialized with ^A and handled by
> Magdir/c64. At the beginning the 2 Control-A characters can be read as
> hexadecimal 0101 or octal 0401. The last is interpreted as "PDP-11
> UNIX/RT ldp" by Magdir/pdp (magic strength=50).
> 
> I first try to additional check for following valid line terminator
> (10=0Ah~LineFeed 13=0Dh~CarriageReturn) according to documentation.
> The MMDF format was rarely used. So i found only few examples. Some
> samples are from software oldmailconvert found on github. This test
> works for maillog.expected.2 but not for sample uupc.input.1. There more
> Control-A are at the beginning. I do not know if this is a real valid
> MMDF example or if this byte sequence is an artefact of transfer
> process. So i decide not to use this test and use another relaxed method
> by looking for MBOX mail characteristic afterwards. Apparently MMDF can
> be used as file name suffix. But when i understand documentation right
> on systems with MMDF format the default mailbox name is like
> /usr/spool/mail/username. That means no file name suffix here. According
> to Magdir/mail.news for embedded MBOX message/rfc822 is used as mime
> type. Unfortunately i found no mime type for MMDF. So i decided to
> choose a similar user defined one. So the magic lines now looks like:
> 0	string	\001\001\001\001
> >5	search/610/b		From\ 	MMDF mailbox
> !:mime	message/x-mmdf
> !:ext	/mmdf
> 
> The description as PDP (with magic strength 50) happens inside
> Magdir/pdp by lines like:
> 
> 0	leshort		0401
> >68	ulelong		!0x00000058	PDP-11 UNIX/RT ldp
> By second test line from me at March 2013 Windows precompiled setup
> information (*.PNF) are already skipped by check for WinDirPathOffset
> value 58h. These tests matches a few D64 Images like "The Great Gianna
> Sisters.d64" (strength=70) initialized with ^A and handled by
> Magdir/c64. These always matches MMDF (Multi-channel Memorandum
> Distribution Facility) mailbox format handled by Magdir/mmdf because
> each message is surrounded by lines containing 4 control-A characters.
> Unfortunately i am too young to have knowledge about PDP machines and i
> find no file format specification on the net. So i try to construct more
> test lines by applying logic like Mister Spock would do. When looking at
> other related PDP-11 samples these seems to be binary (because of
> executable) and not text files. So the MMDF mailbox (magic strength=70
> handled by Magdir/mmdf) with characteristic text fragments can be
> excluded. At offset 15 version byte is stored for other PDP samples.
> This value seems to be often zero. At offset 8 2 byte "debug"
> information (implied by "not stripped" phrase) seems to be stored. So 8
> Control-A character at this locations are very unlikely for real PDP
> examples whereas these byte sequence can occur in few Commodore disc
> image initialized with ^A at the beginning like "The Great Gianna
> Sisters.d64". These disc images are handled by Magdir/c64. So with
> additional tests these lines now become like:
> 0	leshort		0401
> >68	ulelong		!0x00000058
> >>8	quad		!0x0101010101010101
> >>>5	search/610/b	From\
> >>>5	default		x		PDP-11 UNIX/RT ldp
> #!:mime	application/octet-stream
> #!:ext	foo
> I do not find mime type and file name suffix for such samples. So i
> would be pleased if a PDP veteran could supply this information or can
> send me such a real example so i can countercheck my test lines.
> 
> The description as Targa image data (with magic strength=70=110-40)
> happens inside Magdir/images. There exist no simple unique pattern for
> such graphic bitmaps. Therefore the displaying part is already done by
> sub routine tga-image. So  before calling this sub routine some
> different tests are done. Unfortunately "one" branch is at the moment
> true for few Commodore disc image (strength=70=70+0 handled by
> Magdir/c64 like "Cabal+2-MarioSoft(1).d64" "Kokotoni Wilf
> (ANTI-ROM).d64" "The Great Gianna Sisters.d64"). These few disc images
> are initialised with Control-A characters at the beginning instead of
> nil bytes. It is also true for few MMDF mailbox (strength=70 like
> uupc.input.1 handled by Magdir/mmdf) with some Control-A characters at
> the beginning. Control-A is interpreted as value 1 for byte value. Two
> such bytes are interpreted as hexadecimal 0101 (=257 decimal or 0401
> octal). So the misidentified non graphic samples are described with
> dimension 257x257 and origin point at coordinates (+257 +257) and color
> depth 1 (that means black and white). For real TGA samples often the
> dimension is a known monitor size (SVGA 1024 x 768 like in
> bmpsuite-216col1st-98h.tga) or low even square size (128 x 128 for icons
> like in ucm8.tga). Few TGA samples are black and white. Most samples
> real coloured. That means "high"color depth (8 that means 256 colours
> like in input.tga or 24 that means about 16 M colours and so on). In
> most TGA samples the origin is zero point (+0 +0 which is not shown by
> current displaying lines). Few samples have non zero origin. For such
> examples often only one of the coordinates is like dimension size (x or
> y value). So i skip few Commodore D64 disc image like "The Great Gianna
> Sisters.d64" and few MMDF mailbox like uupc.input.1 with unlikely
> dimension 0101h x 0101h (257x257) and +0101h origin (+257 +257) inside
> black&white (color depth 1) color branch. So this branch now becomes like:
> >>>>>16	ubyte			1
> >>>>>>8	quad			!0x0101010101010101
> >>>>>>>0	use		tga-image
> 
> Luckily i found a page about D64 and derivates with more links on file
> formats archive team web site. That informations are now expressed by
> additional comment lines inside Magdir/c64like:
> # URL:		http://fileformats.archiveteam.org/wiki/D64
> # Reference:	http://ist.uwaterloo.ca/~schepers/formats/D64.TXT
> # Reference:	http://ist.uwaterloo.ca/~schepers/formats/D71.TXT
> # Reference:	http://ist.uwaterloo.ca/~schepers/formats/D80-D82.TXT
> # Reference:	http://ist.uwaterloo.ca/~schepers/formats/D81.TXT
> There you find download links for samples like the above inspected ones.
> There i also found links to software like Deark, cbmconvert, DirMaster,
> Total Commander. With the help of these tools i could create my own
> samples. I also could read, convert and extract "unrecognized samples
> with command lines like:
> 	deark -m d64 -l -d2 'sim&rct.d64
> 	cbmconvert -v2 -P -d "The Great Gianna Sisters.d64"
> 
> The description at the moment happens by lines inside Magdir/c64 like:
> 0x16500	belong		0x12014100	D64 Image
> 0x16500	belong		0x12014180	D71 Image
> 0x61800 belong		0x28034400	D81 Image
> So what is wrong? For some variants there exist no entry. For D80 images
> a similar entry would be done by line like:
> 0x44e00 belong		0x26004300	D80 Image
> The first step is to determine the offset of the directory which also
> contain some useful information like disk name and ID. Commodore drives,
> as most floppy drives, have a relatively slow head movement. To speed up
> data access the directory is on the central track. These is different
> for the variants. So for D80 image we have before 38 tracks with 29
> sectors with 256 bytes size. So directory offset here is hexadecimal
> 44e00 (=282112=38*29*256). By first magic test line the first 4 bytes of
> directory part are checked for validity. It starts with
> track value of first BAM (Block Availability Map), followed by sector of
> BAM. This is followed by DOS-version byte and a "reserved" byte. The BAM
> is also located near the "middle". So in most cases we get same values
> here. For D80 samples often BAM Track/Sector value is 38/0 (2600
> hexadecimal) but in few samples (like test-mbox.d80 i got here value
> 39/1 (2701 hexadecimal). The DOS format version byte for D80 is 43h "C"
> and version bytes are 3243h "2C". So show unusual values for D80 by
> additional lines like:
> 
> >>>&0		ubeshort	!0x2600		\b, first BAM
> >>>>&-2	ubyte		x		track %u
> >>>>&-1	ubyte		x		sector %u
> >>>&2		ubyte		!0x43		\b, version %#2.2x
> >>>&0x1B 	ubeshort	!0x3243		\b, type %#4.4x
> >>>>&-2	string		x		"%0.5s"
> 
> So the first test lines must be changed. So for D80 the test for
> track/sectors must be more relaxed. So the first test line becomes like:
> 0x44e00 belong&0xFEfeFFff 0x26004300
> For D64 the DOS format version is usually 41h "A". But in some samples
> (like sim&rct.d64) i got here 42h. According to documentation here
> value not 42h or 0 is misused as "copy protection". Or in other words
> the disc becomes read only. So for D64 images the first test line now
> becomes like:
> 0x16500	belong&0xFFff00ff =0x12010000	D64 Image
> 
> But then less than 32 bits are used for detection, which can lead to
> more collision with other file formats. In directory part are some short
> areas described as unused or filled with 0xA0 according to unofficial
> documentation. These seems to be true for most cases, but in some cases
> instead i often found nil bytes. So show unusual cases (non 0xA0 filled)
> by additional lines like:
> 
> >>>&0x04	ubeshort	!0xa0a0		\b, at +0x4 %#x
> >>>&0x17	ubyte		!0xA0		\b, at +0x17 %#2.2x
> >>>&0x1a	ubyte		!0xA0		\b, at +0x1A %#2.2x
> >>>&0x1D	ubelong		!0xA0A0A0A0h	\b, at +0x1D %#8.8x
> 
> So i choose the 4-byte area (at relative offset +0x1D from directory
> start and filled with 0xa0 and 0x00) as additional second test. So this
> start with lines like:
> >&0x19		ubelong&0x5f5F5F5F	=0	Commodore
> !:mime	application/x-commodore-floppy-image
> Instead of generic application/octet-stream i show an user defined one.
> When applying current used terminology then i would call such samples
> "D80 Image". But when you not grow up in area of playing games on
> Commodore C64 computer like me than this description sounds like egyt
> hieroglyphs for normal users. The phrase D80 means that this 3 byte
> string is used as file name suffix. This becomes more clearly when
> putting this information inside !:ext magic, so that this information is
> shown with option --extension. According to documentation "D80 image"
> are "Commodore 8050 floppy disc image" and "D82 image" are "Commodore
> 8250 floppy disc image". The first floppy variant is single sided with
> file size of 533248 bytes and the second is double-sided with doubled
> file size 1066496. So i show best suited description and correct
> associated suffix by next lines. These look like:
> >>-0	offset		=533248		8050 floppy disc image
> !:ext	d80
> >>-0	offset		=1066496	8250 floppy disc image
> !:ext	d82
> 
> Afterwards i jump backward at the beginning with track sector byte so
> that relative offset can be used as in documentation and maybe the whole
> part of all commodore variants can be unified. So this is done by next
> line which looks like:
> >>&-0x25	belong		x
> Now comes the more informative part especially for normal user and many
> floppy images like collections. At relative offset 6 until 16h a disk
> name padded with A0h (240 octal) is stored. When i use string directive
> i get not 100% correct and also ugly looking output like like 'DIRCBM
> DISK\240\'. I tried also a regular expression with excluding A0h but
> that does not work and i get error messages like:
> c64, 167: Warning: non-ascii characters in regex \0240 `[^\240]{1,16}'
> Maybe another person has more brain than me or more experience with
> regex and can change my lines. So when looking in all my Commodore
> images often the name is alphanumeric with spaces. But i also found
> names with plus, minus, equal signs. Few samples use parentheses for
> copyright phrase like (c). So at the moment the disk name (like "DIRCBM
> DISK" "CBMCOMMAND" for D80) is shown by lines like:
> >>>&06		regex		=[A-Z0-9.+-=!()*#\ ]{1,16} 	"%s"
> #>>&06		regex		=[^\240]{1,16} 			\b, "%s"
> #>>>&06	string		x		\b, DISK_NAME '%0.16s'
> 
> After applying the above mentioned modifications by four patches
> file-5.45-mmdf-d64.diff file-5.45-images-tga.diff file-5.45-pdp-d64.diff
> and file-5.45-c64-d64.diff then all of my inspected samples are now
> recognized. Most of the misidentifications are vanished and some more
> details are shown. This now looks with -k option like:
> 
> 
> Cabal+2-MarioSoft(1).d64:       Commodore 1541 floppy disc image
> 				with errors bytes
> 				" CABAL 100%", ID 0x4d41
> 				, type 0x494f, at +0xA4 0x52
> DolphinDosErrorBytes.d64:       Commodore 1541 floppy disc image
> 				(40 tracks) with errors bytes
> 				"MyDolphinDOS+er", ID 0x4444
> DragonNinja+4DCS+ATC+Pearl.d64: Commodore 1541 floppy disc image
> 				"SPELLETJES", ID 0x363
> 				, at +0xA0 0x2ca0
> Kokotoni Wilf (ANTI-ROM).d64:   Commodore 1541 floppy disc image
> 	      			"2A", ID 0xa0a0
> M1571-D71.d71:                  Commodore 1571 floppy disc image
> 				"MY-DIRMASTER-MY", ID 0xa0a0
> My-8250-D82.d82:                Commodore 8250 floppy disc image
> 				"MY D82 NUMBER 1!", ID 0xa0a
> 				, at +0x1D 00000000
> The Great Gianna Sisters.d64:   Commodore 1541 floppy disc image
>    	  	 		"GIANA-GAME", ID 0x4747
> TheGreatGiannaSisters.d81:      Commodore 1581 floppy disc image
> 				"CBMCONVERT   2.0", ID 0x3938
> 				, at +0x1D 0xa0a0000000000000
> bmpsuite-216col1st-98h.tga:     Targa image data
> 				- Map (152-217) - RLE
> 				1024 x 768 x 8 +768 - top
> brucelee.d64:                   Commodore 1541 floppy disc image
> 				(175104 bytes)
> 				"BRUCE LEE", ID 0x2331
> 				VAX-order 68k Blit mpx/mux executable
> 				TTComp archive data
> 				, ASCII, 4K dictionary
> cbmcmd23.d80:                   Commodore 8050 floppy disc image
> 				"CBMCOMMAND", ID 0x4944
> 				, at +0x4 0
> elektrix.d64:                   Commodore 1541 floppy disc image
> 				"ELEKTRIX", ID 0x3030
> 				, at +0xA7 0xa0000000
> file-speeddos.d64:              Commodore 1541 floppy disc image
> 				(40 tracks)
> 				"My_SpeedDOS-456", ID 0x5350
> input.tga:                      Targa image data
> 				- RGB
> 				70 x 46 x 24
> lastninja12.d81:                Commodore 1581 floppy disc image
> 				"* FINNISH GOLD *", ID 0x323
> 				, at +0x18 0x31, type 0x3120
> ls209.d81:                      Commodore 1581 floppy disc image
> 				"LOADSTAR 209", ID 0x2020
> maillog.expected.2:             MMDF mailbox
> sim&rct.d64:                Commodore 1541 floppy disc image
> 				"SIM&RCT.D64", ID 0x3230
> 				, type 0x3120, version 0x42
> 				, at +0xA4 0x31
> test-mbox.d80:                  Commodore 8050 floppy disc image
> 				"TOTALCOMMANDER"
> 				, ID 0x51a0, first BAM track 39 sector 1
> 				, type 0x43a0 "C\240"
> 				, at +0x4 0, at +0x17 0x4a
> 				, at +0x1A 0x32
> 				, at +0x1D 0xa0a0a000
> test-mmdf.d71:                  Commodore 1571 floppy disc image
> 				with errors bytes
> 				"DIRCBM DISK-my", ID 0x3030
> test-pdp.bin:                   PDP-11 UNIX/RT ldp
> ucm8.tga:                       Targa image data
> 				- Map (256)
> 				128 x 128 x 8
> 				"Truevision(R) Sample Image"
> 				- author "Ricky True"
> 				- comment "Sample 8 bit uncompressed
> 				color mapped image" 24-2-1990 10:00:00
> 				- job "TGA Utilities"
> 				- TGAEdit 1.40
> uupc.input.1:                   MMDF mailbox
> 
> I hope my diff files can be applied in future version of file
> utility.
> 
> With best wishes,
> Jörg Jenderek
> --
> Jörg Jenderek
> <Nachrichtenteil als Anhang.DEFANGED-147><file-5_45-pdp-d64_diff.DEFANGED-148><file-5_45-pdp-d64_diff_sig.DEFANGED-149><file-5_45-mmdf-d64_diff.DEFANGED-150><file-5_45-mmdf-d64_diff_sig.DEFANGED-151><file-5_45-images-tga_diff.DEFANGED-152><file-5_45-images-tga_diff_sig.DEFANGED-153><trid-v-d64.txt.gz><file-5_45-c64-d64_diff.DEFANGED-154><file-5_45-c64-d64_diff_sig.DEFANGED-155>-- 
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>



More information about the File mailing list