[File] [PATCH] Magdir/c64, mmdf, images, pdp collision: D64 Image, MMDF mailbox, Targa image, PDP-11 UNIX/RT ldp
Christos Zoulas
christos at zoulas.com
Thu Feb 29 03:41:03 UTC 2024
- Previous message (by thread): [File] [PATCH] Magdir/c64, mmdf, images, pdp collision: D64 Image, MMDF mailbox, Targa image, PDP-11 UNIX/RT ldp
- Next message (by thread): [File] [PATCH] Magdir/c64, mmdf, images, pdp collision: D64 Image, MMDF mailbox, Targa image, PDP-11 UNIX/RT ldp
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
Committed, thanks. Wow nice analysis!
christos
> On Feb 18, 2024, at 8:26 PM, Jörg Jenderek (GMX) <joerg.jen.der.ek at gmx.net> wrote:
>
> Hello,
>
> some days ago i read an article about emulation of old commodore
> computer. In that context disc image with D64 file name extension are
> mentioned. When running file command version 5.45 with -k option on such
> examples and "related" files i get an output like:
>
> Cabal+2-MarioSoft(1).d64: Targa image data
> - Map (257-257)
> 257 x 257 x 1 +257 +257
> - 1-bit alpha "\001\001\001\00
> D64 Image
> DolphinDosErrorBytes.d64: D64 Image
> DragonNinja+4DCS+ATC+Pearl.d64: D71 Image
> Kokotoni Wilf (ANTI-ROM).d64: Targa image data
> - Map (257-257)
> 257 x 257 x 1 +257 +257
> - 1-bit alpha "\001\001\001\001
> D64 Image
> M1571-D71.d71: Targa image data
> - Map (257-257)
> 257 x 257 x 1 +257 +257
> - 1-bit alpha "\001\001\001\001
> D71 Image
> My-8250-D82.d82: data
> The Great Gianna Sisters.d64: Targa image data
> - Map (257-257)
> 257 x 257 x 1 +257 +257
> - 1-bit alpha "\001\001\001\001
> D64 Image
> MMDF mailbox
> PDP-11 UNIX/RT ldp
> TheGreatGiannaSisters.d81: D81 Image
> bmpsuite-216col1st-98h.tga: Targa image data
> - Map (152-217)
> - RLE
> 1024 x 768 x 8 +768 - top
> brucelee.d64: D64 Image
> VAX-order 68k Blit mpx/mux executable
> TTComp archive data
> , ASCII, 4K dictionary
> cbmcmd23.d80: data
> elektrix.d64: D64 Image
> file-speeddos.d64: D64 Image
> input.tga: Targa image data
> - RGB
> 70 x 46 x 24
> lastninja12.d81: D81 Image
> ls209.d81: D81 Image
> maillog.expected.2: MMDF mailbox
> PDP-11 UNIX/RT ldp
> sim&rct.d64: data
> test-mbox.d80: data
> test-mmdf.d71: D71 Image
> test-pdp.bin: PDP-11 UNIX/RT ldp
> ucm8.tga: Targa image data
> - Map (256)
> 128 x 128 x 8
> "Truevision(R) Sample Image"
> - author "Ricky True"
> - comment "Sample 8 bit uncompressed
> color mapped image" 24-2-1990 10:00:00
> - job "TGA Utilities"
> - TGAEdit 1.40
> uupc.input.1: Targa image data
> - Map (257-257)
> 257 x 257 x 1 +257 +257
> - 1-bit alpha "\001\001"
> MMDF mailbox
> PDP-11 UNIX/RT ldp
>
> Furthermore for samples not described as Targa image only generic mime
> type application/octet-stream is shown with -i option. With --extension
> option only 3 byte sequence ??? is shown for such samples.
>
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/). The
> TGA graphic images are described as "Truevision TGA Bitmap" with version
> 1.0 without mime type by PUID x-fmt/367 or with version 2.0 by PUID
> fmt/402. The sample maillog.expected.2 is described as "MIME Email" with
> version 1.0 and mime type message/rfc822 by PUID fmt/950. Most of the
> Commodore disc images are described as "null bytes" by
> null_bytes.trid.xml because often the images contains no content at the
> beginning (See appended trid-v-d64.txt.gz).
>
> The description as mailbox happens inside Magdir/mmdf by line like:
> 0 string \001\001\001\001 MMDF mailbox
>
> Luckily i found a page about MMDF (Multi-channel Memorandum Distribution
> Facility) on Wikipedia. The file format is also described in section
> five of manual user pages {See mmdf(5)}. This manual is part of packages
> from some user manuals or mutt mail user agent or tin news reader. That
> informations are now expressed by additional comment lines inside
> Magdir/mmdf like:
> # URL: https://en.wikipedia.org/wiki/MMDF
> # Reference: https://docs.oracle.com/cd/E88353_01/html/E37852/
> # mmdf-5.html
>
> According to documentation MMDF (Multi-channel Memorandum Distribution
> Facility) mailbox format is a "legacy" variant of mbox format. Each
> message is surrounded by lines containing 4 control-A. Unfortunately the
> check for these four control characters (magic strength=70) is not
> unique enough. These matches a few D64 Images like "The Great Gianna
> Sisters.d64" (strength=70) initialized with ^A and handled by
> Magdir/c64. At the beginning the 2 Control-A characters can be read as
> hexadecimal 0101 or octal 0401. The last is interpreted as "PDP-11
> UNIX/RT ldp" by Magdir/pdp (magic strength=50).
>
> I first try to additional check for following valid line terminator
> (10=0Ah~LineFeed 13=0Dh~CarriageReturn) according to documentation.
> The MMDF format was rarely used. So i found only few examples. Some
> samples are from software oldmailconvert found on github. This test
> works for maillog.expected.2 but not for sample uupc.input.1. There more
> Control-A are at the beginning. I do not know if this is a real valid
> MMDF example or if this byte sequence is an artefact of transfer
> process. So i decide not to use this test and use another relaxed method
> by looking for MBOX mail characteristic afterwards. Apparently MMDF can
> be used as file name suffix. But when i understand documentation right
> on systems with MMDF format the default mailbox name is like
> /usr/spool/mail/username. That means no file name suffix here. According
> to Magdir/mail.news for embedded MBOX message/rfc822 is used as mime
> type. Unfortunately i found no mime type for MMDF. So i decided to
> choose a similar user defined one. So the magic lines now looks like:
> 0 string \001\001\001\001
> >5 search/610/b From\ MMDF mailbox
> !:mime message/x-mmdf
> !:ext /mmdf
>
> The description as PDP (with magic strength 50) happens inside
> Magdir/pdp by lines like:
>
> 0 leshort 0401
> >68 ulelong !0x00000058 PDP-11 UNIX/RT ldp
> By second test line from me at March 2013 Windows precompiled setup
> information (*.PNF) are already skipped by check for WinDirPathOffset
> value 58h. These tests matches a few D64 Images like "The Great Gianna
> Sisters.d64" (strength=70) initialized with ^A and handled by
> Magdir/c64. These always matches MMDF (Multi-channel Memorandum
> Distribution Facility) mailbox format handled by Magdir/mmdf because
> each message is surrounded by lines containing 4 control-A characters.
> Unfortunately i am too young to have knowledge about PDP machines and i
> find no file format specification on the net. So i try to construct more
> test lines by applying logic like Mister Spock would do. When looking at
> other related PDP-11 samples these seems to be binary (because of
> executable) and not text files. So the MMDF mailbox (magic strength=70
> handled by Magdir/mmdf) with characteristic text fragments can be
> excluded. At offset 15 version byte is stored for other PDP samples.
> This value seems to be often zero. At offset 8 2 byte "debug"
> information (implied by "not stripped" phrase) seems to be stored. So 8
> Control-A character at this locations are very unlikely for real PDP
> examples whereas these byte sequence can occur in few Commodore disc
> image initialized with ^A at the beginning like "The Great Gianna
> Sisters.d64". These disc images are handled by Magdir/c64. So with
> additional tests these lines now become like:
> 0 leshort 0401
> >68 ulelong !0x00000058
> >>8 quad !0x0101010101010101
> >>>5 search/610/b From\
> >>>5 default x PDP-11 UNIX/RT ldp
> #!:mime application/octet-stream
> #!:ext foo
> I do not find mime type and file name suffix for such samples. So i
> would be pleased if a PDP veteran could supply this information or can
> send me such a real example so i can countercheck my test lines.
>
> The description as Targa image data (with magic strength=70=110-40)
> happens inside Magdir/images. There exist no simple unique pattern for
> such graphic bitmaps. Therefore the displaying part is already done by
> sub routine tga-image. So before calling this sub routine some
> different tests are done. Unfortunately "one" branch is at the moment
> true for few Commodore disc image (strength=70=70+0 handled by
> Magdir/c64 like "Cabal+2-MarioSoft(1).d64" "Kokotoni Wilf
> (ANTI-ROM).d64" "The Great Gianna Sisters.d64"). These few disc images
> are initialised with Control-A characters at the beginning instead of
> nil bytes. It is also true for few MMDF mailbox (strength=70 like
> uupc.input.1 handled by Magdir/mmdf) with some Control-A characters at
> the beginning. Control-A is interpreted as value 1 for byte value. Two
> such bytes are interpreted as hexadecimal 0101 (=257 decimal or 0401
> octal). So the misidentified non graphic samples are described with
> dimension 257x257 and origin point at coordinates (+257 +257) and color
> depth 1 (that means black and white). For real TGA samples often the
> dimension is a known monitor size (SVGA 1024 x 768 like in
> bmpsuite-216col1st-98h.tga) or low even square size (128 x 128 for icons
> like in ucm8.tga). Few TGA samples are black and white. Most samples
> real coloured. That means "high"color depth (8 that means 256 colours
> like in input.tga or 24 that means about 16 M colours and so on). In
> most TGA samples the origin is zero point (+0 +0 which is not shown by
> current displaying lines). Few samples have non zero origin. For such
> examples often only one of the coordinates is like dimension size (x or
> y value). So i skip few Commodore D64 disc image like "The Great Gianna
> Sisters.d64" and few MMDF mailbox like uupc.input.1 with unlikely
> dimension 0101h x 0101h (257x257) and +0101h origin (+257 +257) inside
> black&white (color depth 1) color branch. So this branch now becomes like:
> >>>>>16 ubyte 1
> >>>>>>8 quad !0x0101010101010101
> >>>>>>>0 use tga-image
>
> Luckily i found a page about D64 and derivates with more links on file
> formats archive team web site. That informations are now expressed by
> additional comment lines inside Magdir/c64like:
> # URL: http://fileformats.archiveteam.org/wiki/D64
> # Reference: http://ist.uwaterloo.ca/~schepers/formats/D64.TXT
> # Reference: http://ist.uwaterloo.ca/~schepers/formats/D71.TXT
> # Reference: http://ist.uwaterloo.ca/~schepers/formats/D80-D82.TXT
> # Reference: http://ist.uwaterloo.ca/~schepers/formats/D81.TXT
> There you find download links for samples like the above inspected ones.
> There i also found links to software like Deark, cbmconvert, DirMaster,
> Total Commander. With the help of these tools i could create my own
> samples. I also could read, convert and extract "unrecognized samples
> with command lines like:
> deark -m d64 -l -d2 'sim&rct.d64
> cbmconvert -v2 -P -d "The Great Gianna Sisters.d64"
>
> The description at the moment happens by lines inside Magdir/c64 like:
> 0x16500 belong 0x12014100 D64 Image
> 0x16500 belong 0x12014180 D71 Image
> 0x61800 belong 0x28034400 D81 Image
> So what is wrong? For some variants there exist no entry. For D80 images
> a similar entry would be done by line like:
> 0x44e00 belong 0x26004300 D80 Image
> The first step is to determine the offset of the directory which also
> contain some useful information like disk name and ID. Commodore drives,
> as most floppy drives, have a relatively slow head movement. To speed up
> data access the directory is on the central track. These is different
> for the variants. So for D80 image we have before 38 tracks with 29
> sectors with 256 bytes size. So directory offset here is hexadecimal
> 44e00 (=282112=38*29*256). By first magic test line the first 4 bytes of
> directory part are checked for validity. It starts with
> track value of first BAM (Block Availability Map), followed by sector of
> BAM. This is followed by DOS-version byte and a "reserved" byte. The BAM
> is also located near the "middle". So in most cases we get same values
> here. For D80 samples often BAM Track/Sector value is 38/0 (2600
> hexadecimal) but in few samples (like test-mbox.d80 i got here value
> 39/1 (2701 hexadecimal). The DOS format version byte for D80 is 43h "C"
> and version bytes are 3243h "2C". So show unusual values for D80 by
> additional lines like:
>
> >>>&0 ubeshort !0x2600 \b, first BAM
> >>>>&-2 ubyte x track %u
> >>>>&-1 ubyte x sector %u
> >>>&2 ubyte !0x43 \b, version %#2.2x
> >>>&0x1B ubeshort !0x3243 \b, type %#4.4x
> >>>>&-2 string x "%0.5s"
>
> So the first test lines must be changed. So for D80 the test for
> track/sectors must be more relaxed. So the first test line becomes like:
> 0x44e00 belong&0xFEfeFFff 0x26004300
> For D64 the DOS format version is usually 41h "A". But in some samples
> (like sim&rct.d64) i got here 42h. According to documentation here
> value not 42h or 0 is misused as "copy protection". Or in other words
> the disc becomes read only. So for D64 images the first test line now
> becomes like:
> 0x16500 belong&0xFFff00ff =0x12010000 D64 Image
>
> But then less than 32 bits are used for detection, which can lead to
> more collision with other file formats. In directory part are some short
> areas described as unused or filled with 0xA0 according to unofficial
> documentation. These seems to be true for most cases, but in some cases
> instead i often found nil bytes. So show unusual cases (non 0xA0 filled)
> by additional lines like:
>
> >>>&0x04 ubeshort !0xa0a0 \b, at +0x4 %#x
> >>>&0x17 ubyte !0xA0 \b, at +0x17 %#2.2x
> >>>&0x1a ubyte !0xA0 \b, at +0x1A %#2.2x
> >>>&0x1D ubelong !0xA0A0A0A0h \b, at +0x1D %#8.8x
>
> So i choose the 4-byte area (at relative offset +0x1D from directory
> start and filled with 0xa0 and 0x00) as additional second test. So this
> start with lines like:
> >&0x19 ubelong&0x5f5F5F5F =0 Commodore
> !:mime application/x-commodore-floppy-image
> Instead of generic application/octet-stream i show an user defined one.
> When applying current used terminology then i would call such samples
> "D80 Image". But when you not grow up in area of playing games on
> Commodore C64 computer like me than this description sounds like egyt
> hieroglyphs for normal users. The phrase D80 means that this 3 byte
> string is used as file name suffix. This becomes more clearly when
> putting this information inside !:ext magic, so that this information is
> shown with option --extension. According to documentation "D80 image"
> are "Commodore 8050 floppy disc image" and "D82 image" are "Commodore
> 8250 floppy disc image". The first floppy variant is single sided with
> file size of 533248 bytes and the second is double-sided with doubled
> file size 1066496. So i show best suited description and correct
> associated suffix by next lines. These look like:
> >>-0 offset =533248 8050 floppy disc image
> !:ext d80
> >>-0 offset =1066496 8250 floppy disc image
> !:ext d82
>
> Afterwards i jump backward at the beginning with track sector byte so
> that relative offset can be used as in documentation and maybe the whole
> part of all commodore variants can be unified. So this is done by next
> line which looks like:
> >>&-0x25 belong x
> Now comes the more informative part especially for normal user and many
> floppy images like collections. At relative offset 6 until 16h a disk
> name padded with A0h (240 octal) is stored. When i use string directive
> i get not 100% correct and also ugly looking output like like 'DIRCBM
> DISK\240\'. I tried also a regular expression with excluding A0h but
> that does not work and i get error messages like:
> c64, 167: Warning: non-ascii characters in regex \0240 `[^\240]{1,16}'
> Maybe another person has more brain than me or more experience with
> regex and can change my lines. So when looking in all my Commodore
> images often the name is alphanumeric with spaces. But i also found
> names with plus, minus, equal signs. Few samples use parentheses for
> copyright phrase like (c). So at the moment the disk name (like "DIRCBM
> DISK" "CBMCOMMAND" for D80) is shown by lines like:
> >>>&06 regex =[A-Z0-9.+-=!()*#\ ]{1,16} "%s"
> #>>&06 regex =[^\240]{1,16} \b, "%s"
> #>>>&06 string x \b, DISK_NAME '%0.16s'
>
> After applying the above mentioned modifications by four patches
> file-5.45-mmdf-d64.diff file-5.45-images-tga.diff file-5.45-pdp-d64.diff
> and file-5.45-c64-d64.diff then all of my inspected samples are now
> recognized. Most of the misidentifications are vanished and some more
> details are shown. This now looks with -k option like:
>
>
> Cabal+2-MarioSoft(1).d64: Commodore 1541 floppy disc image
> with errors bytes
> " CABAL 100%", ID 0x4d41
> , type 0x494f, at +0xA4 0x52
> DolphinDosErrorBytes.d64: Commodore 1541 floppy disc image
> (40 tracks) with errors bytes
> "MyDolphinDOS+er", ID 0x4444
> DragonNinja+4DCS+ATC+Pearl.d64: Commodore 1541 floppy disc image
> "SPELLETJES", ID 0x363
> , at +0xA0 0x2ca0
> Kokotoni Wilf (ANTI-ROM).d64: Commodore 1541 floppy disc image
> "2A", ID 0xa0a0
> M1571-D71.d71: Commodore 1571 floppy disc image
> "MY-DIRMASTER-MY", ID 0xa0a0
> My-8250-D82.d82: Commodore 8250 floppy disc image
> "MY D82 NUMBER 1!", ID 0xa0a
> , at +0x1D 00000000
> The Great Gianna Sisters.d64: Commodore 1541 floppy disc image
> "GIANA-GAME", ID 0x4747
> TheGreatGiannaSisters.d81: Commodore 1581 floppy disc image
> "CBMCONVERT 2.0", ID 0x3938
> , at +0x1D 0xa0a0000000000000
> bmpsuite-216col1st-98h.tga: Targa image data
> - Map (152-217) - RLE
> 1024 x 768 x 8 +768 - top
> brucelee.d64: Commodore 1541 floppy disc image
> (175104 bytes)
> "BRUCE LEE", ID 0x2331
> VAX-order 68k Blit mpx/mux executable
> TTComp archive data
> , ASCII, 4K dictionary
> cbmcmd23.d80: Commodore 8050 floppy disc image
> "CBMCOMMAND", ID 0x4944
> , at +0x4 0
> elektrix.d64: Commodore 1541 floppy disc image
> "ELEKTRIX", ID 0x3030
> , at +0xA7 0xa0000000
> file-speeddos.d64: Commodore 1541 floppy disc image
> (40 tracks)
> "My_SpeedDOS-456", ID 0x5350
> input.tga: Targa image data
> - RGB
> 70 x 46 x 24
> lastninja12.d81: Commodore 1581 floppy disc image
> "* FINNISH GOLD *", ID 0x323
> , at +0x18 0x31, type 0x3120
> ls209.d81: Commodore 1581 floppy disc image
> "LOADSTAR 209", ID 0x2020
> maillog.expected.2: MMDF mailbox
> sim&rct.d64: Commodore 1541 floppy disc image
> "SIM&RCT.D64", ID 0x3230
> , type 0x3120, version 0x42
> , at +0xA4 0x31
> test-mbox.d80: Commodore 8050 floppy disc image
> "TOTALCOMMANDER"
> , ID 0x51a0, first BAM track 39 sector 1
> , type 0x43a0 "C\240"
> , at +0x4 0, at +0x17 0x4a
> , at +0x1A 0x32
> , at +0x1D 0xa0a0a000
> test-mmdf.d71: Commodore 1571 floppy disc image
> with errors bytes
> "DIRCBM DISK-my", ID 0x3030
> test-pdp.bin: PDP-11 UNIX/RT ldp
> ucm8.tga: Targa image data
> - Map (256)
> 128 x 128 x 8
> "Truevision(R) Sample Image"
> - author "Ricky True"
> - comment "Sample 8 bit uncompressed
> color mapped image" 24-2-1990 10:00:00
> - job "TGA Utilities"
> - TGAEdit 1.40
> uupc.input.1: MMDF mailbox
>
> I hope my diff files can be applied in future version of file
> utility.
>
> With best wishes,
> Jörg Jenderek
> --
> Jörg Jenderek
> <Nachrichtenteil als Anhang.DEFANGED-147><file-5_45-pdp-d64_diff.DEFANGED-148><file-5_45-pdp-d64_diff_sig.DEFANGED-149><file-5_45-mmdf-d64_diff.DEFANGED-150><file-5_45-mmdf-d64_diff_sig.DEFANGED-151><file-5_45-images-tga_diff.DEFANGED-152><file-5_45-images-tga_diff_sig.DEFANGED-153><trid-v-d64.txt.gz><file-5_45-c64-d64_diff.DEFANGED-154><file-5_45-c64-d64_diff_sig.DEFANGED-155>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>
- Previous message (by thread): [File] [PATCH] Magdir/c64, mmdf, images, pdp collision: D64 Image, MMDF mailbox, Targa image, PDP-11 UNIX/RT ldp
- Next message (by thread): [File] [PATCH] Magdir/c64, mmdf, images, pdp collision: D64 Image, MMDF mailbox, Targa image, PDP-11 UNIX/RT ldp
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
More information about the File
mailing list