[File] [PATCH] Magdir/c64, archive, ibm6000, terminfo Commodore BASIC/program missed or misidentified
Jörg Jenderek
joerg.jen.der.ek at gmx.net
Sat Nov 12 02:07:54 UTC 2022
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello,
some weeks ago ago i send patch for Novell LANalyzercapture files
with extension TR1. Unfortunately also some Commodore BASIC program
with PRG suffix were described as such captures. Often these
Commodore example have file name extension PRG.
When running file command version 5.43 with -k option on such
"compiled" Commodore Basic Programs (tokenized from pure BASIC source
like *.BAS), Commodore binary executables and related misidentified
then i get an output like:
C64 Sprite Demo.prg: CBM BASIC, SYS 2078 COMPRESSED BY \245S
FlappyBird.prg: CBM BASIC, SYS 2061
Mastermind.prg: shared library
TTComp archive data, ASCII, 1K dictionary
Microzodiac.bas: ASCII text
Microzodiac.prg: Novell LANalyzer capture file
Minefield.prg: Novell LANalyzer capture file
Monopoly.bas: ASCII text
Monopoly.prg: data
SVr3cursesTest.bin: SVr3 curses screen image, big-endian
SYNTHE.PRG: , SYS ** SYNTHESIZER BY RICOCHET **
SYNTHE.bas: ASCII text
Vic-tac-toe.bas: ASCII text
Vic-tac-toe.prg: Novell LANalyzer capture file
XLINK.PRG: SVr3 curses screen image, big-endian
XLINK.bas: ASCII text
breakvic_joy.bas: ASCII text, with CRLF line terminators
breakvic_joy.prg: Novell LANalyzer capture file
derby.log: ASCII text, with CRLF, LF line terminators
gunzip111.c64.prg: PUCrunch archive data
CBM BASIC, SYS 2061
hello-c128.prg: SVr3 curses screen image, big-endian
hello-c64.prg: CBM BASIC, SYS 2061
hello-pet.prg: shared library
TTComp archive data, ASCII, 1K dictionary
hello.c: C source text
helloWorld.bas: ASCII text
helloWorld.prg: , SYS "Hello world"
novell-lanalyzer.tr1: Novell LANalyzer capture file
pods.bas: , SYS \307(142): D$\262"\035\035\035"
saveroms: , SYS ********************************
saveroms.bas: ASCII text
sheridan-c16.pck: Novell LANalyzer capture file
sheridan.bin: CBM BASIC, SYS 2064
sheridan.pck: PUCrunch archive data
CBM BASIC, SYS 2061
ttcomp-ascii-1k.bin: shared library
TTComp archive data, ASCII, 1K dictionary
unzp6420.prg: , SYS "\021\005 \325NZIP64V2.00
\320UBLIC \304ISTRIBUTION"
unzp6420.txt: ASCII text
victracker.prg: data
Furthermore with -i option only generic application/octet-stream is
shown. With option --extension only 3 byte sequence ??? is shown.
For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html).
Real TTComp archive data like examples ttcomp-ascii-1k.bin, which are
described also false by file command as shared library, are described
correctly as "TTComp archive compressed (ASCII-1K)" by
ark-ttcomp-ascii-1k.trid.xml. Such examples are also described with
same rate as "Commodore PET BASIC 4.0 program" by prg-pet.trid.xml.
But this description is only correct for Commodore PET BASIC/programs
like Mastermind.prg, which are misidentified as TTComp archive.
Real Novell LANalyzer capture file like novell-lanalyzer.tr1 are
described wrong as "Commodore Plus/4 BASIC V3.5 program" by
prg-plus4.trid.xml and with same rate as "Commodore VIC-20 BASIC V2
program" by prg-vic20.trid.xml because these files start with 2 byte
sequence 0110h. But this description is only correct for Commodore
C16/VIC-20/Plus4 BASIC/program (like Microzodiac.prg Minefield.prg
Vic-tac-toe.prg breakvic_joy.prg). That this samples are not Novell
LANalyzer captures, but are Commodore BASIC programs can be see that
there exist a corresponding BASIC source file like breakvic_joy.bas
or the source can be regenerated by de-tokenize for example via VICE
emulator tool petcat by command like:
petcat -2 -o Vic-tac-toe.bas -- Vic-tac-toe.prg
Samples which are described as "SVr3 curses screen image" like real
SVr3cursesTest.bin or Commodore C128 BASIC/Program are described as
"Commodore 128 BASIC V7.0 program" by prg-c128.trid.xml.
The samples described by file command as "CBM BASIC" or with ", SYS"
phrase are described as "Commodore 64 BASIC V2 program" by
prg-c64.trid.xml.
Some examples like victracker.prg are described by file command only
as data. These are described as "Commodore VIC-20 BASIC V2 program
(8K RAM expansion)" by prg-vic20-8k.trid.xml (See appended
trid-v-prg.txt.gz).
So with the help of TrID output i found web pages with needed
information. That is now expressed by lines like:
# URL: http://fileformats.archiveteam.org/
# wiki/Commodore_BASIC_tokenized_file
# Reference: https://www.c64-wiki.com/wiki/BASIC_token
# https://github.com/thezerobit/bastext/blob/master/
# bastext.doc
# http://mark0.net/download/triddefs_xml.7z
# defs/p/prg-c64.trid.xml
The current description happens inside Magdir/c64 by lines like:
0 leshort 0x0801
>2 leshort 0x080b
>6 string \x9e CBM BASIC
>7 string >\0 \b, SYS %s
First it use only 16 bit for recognition. So the magic is too weak.
This sometimes leads to misidentifications like:
Novell LANalyzer capture file
TTComp archive data
shared library
The current lines consider only commodore C64 variant ( starting
with memory address 0x0801). So other commodore variants ( like
C16/VIC-20/Plus4 ) are missed.
Then must or should distinguish 2 variants. One are "compiled" BASIC
programs. These are just tokenized from BASIC sources ( pure text
file often with BAS extension). So these are "slower", but these are
"more human readable"and the source can be regenerated by de-tokenize
(for example by VICE emulator tool petcat). The other are binary
executables (often indicated by SYS token with value 9e). These are
"faster" in execution but source can not be regenerated so easily.
These examples are created for example by compiling via cc65 tool
from c-sources. In current magic these two cases are switched.
When comparing describing text with TrID i changed "CBM BASIC" to
"Commodore C64 BASIC program". Then for other Commodore systems i
replace the phrase "C64" by other corresponding phrase. Furthermore i
add phrase "program" to emphasize/distinguish from pure BASIC text
source ( often with BAS suffix). For the pure binary executables i
choose describing text like "Commodore C64 program".
Because magics are weak and distinguish for different Commodore
systems i put displaying parts in sub routines. Common is a
"tokenized" BASIC line. That is described by sub routine like:
0 name basic-line
>0 uleshort x \b, offset %#4.4x
>2 uleshort x \b, line %u
>4 ubyte x \b, token (%#x)
>4 string \x8f REM
>>5 string >\0 %s
>4 string \x99 PRINT
>>5 string x %s
>4 string \x97 POKE
>>5 regex \^[0-9,\040]+ %s
>4 string \x9e SYS
>>5 regex \^[0-9]{1,5} %s
The displayed offset is the pointer to memory address of beginning of
next BASIC line. When we know address of current basic line (ADR),
then for worst "big" case of tokenized BASIC this value is ADR+100h,
because maximal total BASIC line length is 256 ( hexadecimal 100).
Worst "small" case is ADR+6 { 2 (offset) + 2 (line number) + 1
(minimal line content assuming 1 token) + 1 ( end of line terminator
with value 0x0)}. So we can use this as test. The Commodore
C16/VIC-20/Plus4 BASIC programs start at memory address 0x1001. That
is stored as 2 byte little endian in first bytes. Then i can use
offset to second line by tests like:
>>2 uleshort >0x1006 OFFSET_NOT_TOO_LOW
>>>2 uleshort <0x1102 OFFSET_NOT_TOO_HIGH
So by first additional test for example i can skip misidentified
regular Novell LANalyzer captures (like novell-2.tr1 novell-win10.tr1
handled by Magdir/sniffer) with "invalid low" second line offset 4Ch.
But danger when using such tests, because there are subtle traps.
These lines are only always TRUE for "tokenized" Commodore BASIC
programs. For binary executable Commodore program Minefield.prg i get
here in first line fragment offset value 0x123b.
Also the offset value 0x0000 can occur. It took me a day to understan
d
it, because it is not explicitly written. On last BASIC line this
occurs. This is marker for the BASIC interpreter that the end of
program is reached. Often this are also the last bytes of stored
tokenized BASIC program. So for control reason is show no nil values
by lines at end of sub routine like:
>-3 ubyte !0 \b, 3 last bytes %#2.2x
>>-2 ubeshort x \b%4.4x
When in second line fragment 0 offset occurs, then this means it is
real "tokenized" BASIC program. That is very unlikely and occur only
in "artificial" examples like tutorial example helloWorld.prg. On the
other hand for binary COMMODORE executables this is often true. I
explain later why.
The shown BASIC line number is in the range from 0 to 65520, but it
is practice to increment numbers by some value (like 5, 10 or 100).
So in "well behaved" examples ( like breakvic_joy.prg) i get line
sequence 10 20 30 40 50 ... (see also source breakvic_joy.bas).
For real Commodore binary executables i get more "bad looking
sequences like 20, 7840 in FlappyBird.prg or most "bad looking" 1989,
47736 in "C64 Sprite Demo.prg". And of course in misidentified non
Commodore examples i get often here "bad" line numbers like
1281 in regular Novell LANalyzer capture (novell-lanalyzer.tr1).
After line number comes BASIC line content. I show content of first
byte as hexadecimal value. Often and especially for the first line
this is a tokenized BASIC command. The "high-bit" bytes from 128 tile
254 stood for the various BASIC commands and mathematical operators.
So i can use this feature as an additional test criterium. So i skip
regular Novell LANalyzer capture (novell-2.tr1 novell-lanalyzer.tr1
novell-win10.tr1) with "invalid low" token value 54h by line like:
>6 ubyte >0x7F TOKEN_VALUE_NOT_TOO_LOW
The hexadecimal value 9e means SYS command. That tells the processor
to execute the machine language subroutine at a specific address. The
<Address> parameter is an unsigned integer, i.e. an integer in the
range 0 through 65535. I use a regular expression for catching and
displaying maximal 5 digits. If i use just a string than all until
end of line terminator is displayed. So in some examples this address
is followed by spaces, "control-characters" (which i do not
understand) or colon (:) followed by next commands. Then the output
columns would get sometimes very big and i get not informational
bargain. This is now done by fragment like:
>4 string \x9e SYS
>>5 regex \^[0-9]{1,5} %s
Now comes an interesting part. For this i need some days to
understand it. In the intro of the cl65 compiler suite is written tha
t
it prepends a header or stub which corresponds to a PRG-format BASIC
program, consisting of a single line, similar to this:
20 sys2061
I verified this by creating a binary from c-source for example by
command line like:
cl65 -v -t c64 -o hello-c64.prg hello.c text.s
That explains why "pure" binary Commodore executable at first glance
look like a tokenized BASIC program. So i use this as criterium to
distinguish between "tokenized" BASIC program and pure binary
executable. If first token is SYS this implies binary executable, if
not then it is a BASIC program. For Basic program for Commodore
VIC-20 computer with 8K RAM expansion (start address is 1201h) this
looks like:
0 leshort 0x1201
>6 ubyte !0x9e
>>0 use vic-prg
>6 ubyte =0x9e
>>0 use vic-exe
But things are get complicated, because this consideration is not
always true. This applies to cl65 complier suite. But obviously there
exist other compiler or handmade stubs. This took another day. The
example unzp6420.prg look at first glance like a tokenized BASIC
program. But in reality the BASIC stub contains just 2 BASIC lines
with PRINT directive. And more complicated the SYS directive just
follows, but not on a new BASIC line but after command separator
colon (:). The example is a self extracting ZIP program ( visible by
PRINT directive). So i insert for Commodore C64 computers a "manual
exception branch". So here starting lines look like:
0 leshort 0x0801
>6 ubyte !0x9e
>>23 search/30 \323ELF-E\330TRACTING-\332IP
>>>0 use c64-exe
>>23 default x
>>>0 use c64-prg
>6 ubyte =0x9e
>>0 use c64-exe
For the token 8fh (that is REM directive) and 99h (that is PRINT
directive) i show explicit token name and also the following content
by string. Often this contains useful meta information like program
name, version or author like "SYNTHESIZER BY RICOCHET" in SYNTHE.PRG
or \325NZIP64V2.00 in unzp6420.prg.
So the whole subroutine for tokenized C64 BASIC program looks like:
0 name c64-prg
>0 uleshort x Commodore C64 BASIC program
!:mime application/x-commodore-basic
!:ext prg/bas/
>0 uleshort !0x0801 \b, start address %#4.4x
>2 use basic-line
>(2.s-0x0800) ubyte x
>>&-1 ubyte !0 \b, no EOL=%#x
>>&0 use basic-line
>-3 ubyte !0 \b, 3 last bytes %#2.2x
>>-2 ubeshort x \b%4.4x
Instead of generic application/octet-stream i show a user defined
one. Tokenized BASIC programs were stored by Commodore as file type
program "PRG" in separate field in directory structures. So file
name can have no suffix like in example saveroms; When transferring
to other platforms, they are often saved with .prg extensions. The
BAS suffix is typically used for the BASIC source but also found in
program pods.bas. The BASIC lines are terminated by nil-byte. For
Control reason display unexpected case by line with phrase "no EOL".
This can only occur in binaries. Afterward the second basic line is
displayed after jump by second call of sub routine basic-line. This
part is nonsense for pure binary which are described by sub routine
like c64-exe. Another difference is that there another mime type is
used. And there i also found no BAS name suffix. The sub routines
for other machine look similar. There the difference is that
address part 080? is replaced by suited address and phrase "C64" in
describing text is also changed. Maybe it is possible unify all
these subroutines. But after spending 2 weeks for Commodore stuff i
am too tired to try to do this.
Some PRG examples like gunzip111.c64.prg are also described as
"PUCrunch archive data". The description happens inside
Magdir/archive by line like:
0 string \x01\x08\x0b\x08\xef\x00\x9e\x32\x30\x36\x31 PUCrunch
When using updated Magdir/c64 we see it is a Commodore C64 program
(starting address 0801 hexadecimal) with next offset 080b
hexadecimal. The first BASIC line number is 239 ( 00EF hexadecimal).
First BASIC instruction is SYS 2061 ( that is SYS token 9E followed
by 4 digit characters 32 30 36 31 in hexadecimal format). I
verified that this is true by running command like:
pucrunch sheridan.bin sheridan.pck
So i moved PUCrunch archive data" entry from ./archive and merged
it with sub routine c64-exe ( other suffix PCK instead of PRG). But
again here things are complicated. First of all i can also create
such archive for other systems with other magics by command like:
pucrunch-c16 sheridan.bin sheridan-c16.pck
Then there exist samples like gunzip111.c64.prg, which look like
PUCrunch archive. But when i try to unpack such samples by command
like:
pucrunch -u gunzip111.c64.prg
i get err message like:
Not C64 short (251 > 28)
Detected C64 (19 <= 31)
Error: Broken archive, LZ copy position underrun at 828 (10438).
lzLen 3.
So i believe that SYS 2061 and line 239 combination is probably
also used by other Commodore programs. So the description as
PUCrunch archive is not reliable and the found SYS and line
combination is only a hint for PUCrunch archive.
All Commodore PET BASIC/program (like mastermind.prg with start
address 0401h) are also described as "TTComp archive data, ASCII, 1K
dictionary" with strength 48 (=50-2). The description happens inside
Magdir/archive by line like:
0 string \1\4
>0 use ttcomp
Luckily the displaying part is done by sub routine. So only suited
lines must be added before calling sub routine ttcomp. According to
jsummers for real TTComp the last 3 bytes of a file should match
one of these 8 patterns. These are non nil. For tokenized Commodore
PET BASIC when after end of line separator (that is \0) the next
offset value is 0000h) then this terminates the whole BASIC
program. In most cases these byte sequence is also the last bytes
in whole BASIC file. So this now becomes like:
0 string \1\4
>-4 ubelong&0x00FFffFF !0
>>0 use ttcomp
All Commodore PET BASIC/program (like mastermind.prg with start
address 0401h) are also described as "shared library" with strength
50. The description happens inside Magdir/ibm6000 by line like:
0 beshort 0x0104 shared library
Unfortunately i have no knowledge about IBM RS/6000 machines. So i wa
s
not able to improve that above weak magic. So i only mention in
comment line that this collides with Commodore PET BASIC/programs.
All Commodore C128 BASIC 7.0/programs (like XLINK.PRG hello-c128.prg
with start address 1C01h) are also described as "SVr3 curses screen
image, big-endian" with strength 50. The description happens inside
Magdir/terminfo by line like:
0 beshort 0434 SVr3 curses screen image, big-endian
Unfortunately i have no knowledge about terminfo and no access to big
endian machines. So i was not able to improve that above weak magic.
So i only mention in comment line that this collides with Commodore
C128 BASIC/programs. But when looking in little endian curses sample
( mentioned on scr_dump(5) man page) at offset 2 values seems to
appear "high". So by check of second offset value not too high for
C128 samples i probably skip such curses samples. For completeness i
also check for value not too low. So the starting lines for C128
BASIC 7.0 samples look like:
0 leshort 0x1C01
!:strength +1
>2 uleshort <0x1D02
>>2 uleshort >0x1C06
>>>6 ubyte !0x9e
>>>>0 use c128-prg
>>>6 ubyte =0x9e
>>>>0 use c128-exe
After applying the above mentioned modifications by patches
file-5.43-c64-prg.diff file-5.43-archive-prg.diff
file-5.43-ibm6000-prg.diff file-5.43-terminfo-prg.diff
and previous file-5.43-sniffer-novell.diff then more Commodore
BASIC/programs are described (also with more details) and more
misidentification vanish. This now looks with -k option like:
C64 Sprite Demo.prg: Commodore C64 program,
offset 0x081c, line 1989, token (0x9e)
SYS 2078, 3 last bytes 0x2e4160
FlappyBird.prg: Commodore C64 program,
offset 0x080b, line 20, token (0x9e)
SYS 2061, 3 last bytes 0x83a8ac
Mastermind.prg: Commodore PET BASIC program,
offset 0x042e, line 1000, token (0x97)
POKE 36879,25,
offset 0x046e, line 1003, token (0x99)
PRINT "\021\021\021\021STOP IS EINDE SPEL"
:\231"\021HELP IS LIST SPEL"
:\201I\2621\2443000:\202\012-
shared library
Microzodiac.bas: ASCII text
Microzodiac.prg: Commodore C16/VIC-20/Plus4 BASIC program,
offset 0x100a, line 100, token (0x99)
PRINT "\223",
offset 0x1013, line 105, token (0x8d)
GOSUB 555
Minefield.prg: Commodore C16/VIC-20/Plus4 program,
offset 0x123b, line 1, token (0x97)
POKE 36879,125, no EOL=0x36
Monopoly.bas: ASCII text
Monopoly.prg: Commodore VIC-20 +8K BASIC program,
offset 0x123e, line 10, token (0x8f)
REM MONOPOLY BY P.WEPS 8500 NBG
, KILIANSTR.97 TEL.34 32 55,
offset 0x1256, line 20, token (0x8f)
REM PROG.-ID.MONO82-3
SVr3cursesTest.bin: SVr3 curses screen image, big-endian
SYNTHE.PRG: Commodore C64 BASIC program,
offset 0x0825, line 10, token (0x8f)
REM ** SYNTHESIZER BY RICOCHET **,
offset 0x083d, line 20, token (0x97)
POKE 53280,15, 3 last bytes 0x202020
SYNTHE.bas: ASCII text
Vic-tac-toe.bas: ASCII text
Vic-tac-toe.prg: Commodore C16/VIC-20/Plus4 BASIC program,
offset 0x1024, line 0, token (0x8f)
REM "\024\024\024\024\024\024
*** BY CRAIG BRUCE ***,
offset 0x106a, line 1, token (0x4d)
XLINK.PRG: Commodore C128 BASIC program,
offset 0x1c3f, line 10, token (0x8b),
offset 0x1c78, line 20, token (0xde)
SVr3 curses screen image, big-endian
XLINK.bas: ASCII text
breakvic_joy.bas: ASCII text, with CRLF line terminators
breakvic_joy.prg: Commodore C16/VIC-20/Plus4 BASIC program,
offset 0x104e, line 10, token (0x97)
POKE 36878,15,
offset 0x1091, line 20, token (0x54)
gunzip111.c64.prg: Commodore C64 program
, probably PUCrunch archive data,
offset 0x080b, line 239, token (0x9e)
SYS 2061, 3 last bytes 0x829ff8
hello-c128.prg: Commodore C128 program,
offset 0x1c0b, line 531, token (0x9e)
SYS 7181
SVr3 curses screen image, big-endian
hello-c64.prg: Commodore C64 program,
offset 0x080b, line 531, token (0x9e)
SYS 2061, 3 last bytes 0xff5d12
hello-pet.prg: Commodore PET program,
offset 0x040b, line 531, token (0x9e)
SYS 1037
shared library
hello.c: ASCII text
helloWorld.bas: ASCII text
helloWorld.prg: Commodore C64 BASIC program,
offset 0x0815, line 10, token (0x99)
PRINT "Hello world",
offset 0000, line 0, token (0)
novell-lanalyzer.tr1: Novell LANalyzer capture file, version 1.5,
record length 0x4c, 2nd record length 0x80,
names Channel1 Channel2 ...
pods.bas: Commodore C64 BASIC program,
offset 0x0819, line 4, token (0x99)
PRINT \307(142): D$\262"\035\035\035",
offset 0x0825, line 5, token (0x81)
saveroms: Commodore C64 BASIC program,
offset 0x0828, line 10, token (0x8f)
REM ********************************,
offset 0x084f, line 12, token (0x8f)
REM * SAVEROMS *
saveroms.bas: ASCII text
sheridan-c16.pck: Commodore C16/VIC-20/Plus4 program,
offset 0x100b, line 239, token (0x9e)
SYS 4109, 3 last bytes 0xb3fff0
sheridan.bin: Commodore C64 program,
offset 0x080b, line 0, token (0x9e)
SYS 2064, 3 last bytes 0xf9f9fa
sheridan.pck: Commodore C64 program
, probably PUCrunch archive data,
offset 0x080b, line 239, token (0x9e)
SYS 2061, 3 last bytes 0xb3fff0
ttcomp-ascii-1k.bin: shared library
TTComp archive data dictionary
unzp6420.prg: Commodore C64 program,
offset 0x082c, line 1998, token (0x99)
PRINT "\021\005 \325NZIP64V2.00
\320UBLIC \304ISTRIBUTION",
offset 0x085b, line 1999, token (0x99)
PRINT "\021 \323ELF-E\330TRACTING-\332IP
(64 ONLY)\016\231":\2362141
unzp6420.txt: ASCII text
victracker.prg: Commodore VIC-20 +8K program,
offset 0x1223, line 2004, token (0x9e)
SYS, no EOL=0x50, 3 last bytes 0x201102
I hope my diff files can be applied in future version of file
utility.
Unfortunately there exist some more Commodore BASIC variants, but i
myself found no examples. So i add for such systems magic lines
only as commented fragments. Maybe also other BASIC samples not for
Commodore are described by my magics.
With best wishes,
Jörg Jenderek
- --
Jörg Jenderek
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCY27/+QAKCRCv8rHJQhrU
1iP2AKCUYEI3SKwcrG3dJKtDebEDwKA6awCgwboLaFSvRBKos+Y+a67EE4JdIxQ=
=sJZ5
-----END PGP SIGNATURE-----
-------------- next part --------------
--- file-5.43/magic/Magdir/ibm6000.old 2021-07-05 11:33:09.000000000 +0200
+++ file-5.43/magic/Magdir/ibm6000 2022-11-09 00:11:21.522922700 +0100
@@ -3,4 +3,5 @@
# $File: ibm6000,v 1.15 2021/07/03 14:01:46 christos Exp $
# ibm6000: file(1) magic for RS/6000 and the RT PC.
+# https://en.wikipedia.org/wiki/IBM_RS/6000
#
0 beshort 0x01df executable (RISC System/6000 V3.1) or obj module
@@ -12,4 +13,5 @@
#>6 beshort >0 - version %ld
# GRR: line below is too general as it matches also TTComp archive, ASCII, 1K handled by ./archive
+# and Commodore PET BASIC program (Mastermind.prg with start address 401h) and strength (51=50+1)
0 beshort 0x0104 shared library
# GRR: line below is too general as it matches also TTComp archive, ASCII, 2K handled by ./archive
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.43-ibm6000-prg.diff.sig
Type: application/octet-stream
Size: 626 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20221112/1e120e7d/attachment-0004.obj>
-------------- next part --------------
--- file-5.43/magic/Magdir/archive.old 2022-09-13 20:05:39.000000000 +0200
+++ file-5.43/magic/Magdir/archive 2022-11-09 00:18:39.096366400 +0100
@@ -507,9 +507,10 @@
0 string \1\4
# TODO:
-# skip Commodore PET BASIC 4.0 program *.prg
-# variant ASCII, 1K dictionary (strength=48=50-2). With strength=49 wrong order! WHY?
# skip shared library (strength=50) handled by ./ibm6000
!:strength -2
->0 use ttcomp
+# skip Commodore PET BASIC programs (Mastermind.prg) with last 3 nil bytes (\0~end of line followed by 0000h line offset)
+#>-4 ubelong x LAST_BYTES=%8.8x
+>-4 ubelong&0x00FFffFF !0
+>>0 use ttcomp
# display information of TTComp archive
0 name ttcomp
@@ -792,6 +793,4 @@
# Terse
0 string \5\1\1\0 Terse archive data
-# PUCrunch
-0 string \x01\x08\x0b\x08\xef\x00\x9e\x32\x30\x36\x31 PUCrunch archive data
# UHarc
0 string UHA UHarc archive data
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.43-archive-prg.diff.sig
Type: application/octet-stream
Size: 694 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20221112/1e120e7d/attachment-0005.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.43-terminfo-prg.diff.sig
Type: application/octet-stream
Size: 507 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20221112/1e120e7d/attachment-0006.obj>
-------------- next part --------------
--- file-5.43/magic/Magdir/terminfo.old 2021-02-23 01:51:10.000000000 +0100
+++ file-5.43/magic/Magdir/terminfo 2022-11-10 16:41:06.298555800 +0100
@@ -37,6 +37,7 @@
# AIX and HPUX use the SVr4 big-endian format
# Solaris uses the SVr3 formats (sparc and x86 differ endian-ness)
0 beshort 0433 SVr2 curses screen image, big-endian
+# GRR: line below too general as it catches Commodore C128 program (crc32.prg XLINK.PRG) with start address 1C01h handled by ./c64
0 beshort 0434 SVr3 curses screen image, big-endian
0 beshort 0435 SVr4 curses screen image, big-endian
#
-------------- next part --------------
--- file-5.43/magic/Magdir/c64.old 2022-05-14 22:03:39.000000000 +0200
+++ file-5.43/magic/Magdir/c64 2022-11-12 02:44:28.599089600 +0100
@@ -194,7 +194,338 @@
>100 byte >0 \b, %u subsong(s)
# CBM BASIC (cc65 compiled)
+# Summary: binary executable or Basic program for Commodore C64 computers
+# Update: Joerg Jenderek
+# URL: http://fileformats.archiveteam.org/wiki/Commodore_BASIC_tokenized_file
+# Reference: https://www.c64-wiki.com/wiki/BASIC_token
+# https://github.com/thezerobit/bastext/blob/master/bastext.doc
+# http://mark0.net/download/triddefs_xml.7z/defs/p/prg-c64.trid.xml
+# TODO: unify Commodore BASIC/program sub routines
+# Note: "PUCrunch archive data" moved from ./archive and merged with c64-exe
0 leshort 0x0801
->2 leshort 0x080b
->6 string \x9e CBM BASIC
->7 string >\0 \b, SYS %s
+# if first token is not SYS this implies BASIC program in most cases
+>6 ubyte !0x9e
+# but sELF-ExTRACTING-zIP executable unzp6420.prg contains SYS token at end of second BASIC line (at 0x35)
+>>23 search/30 \323ELF-E\330TRACTING-\332IP
+>>>0 use c64-exe
+>>23 default x
+>>>0 use c64-prg
+# if first token is SYS this implies binary executable
+>6 ubyte =0x9e
+>>0 use c64-exe
+# display information about C64 binary executable (memory address, line number, token)
+0 name c64-exe
+>0 uleshort x Commodore C64
+# http://a1bert.kapsi.fi/Dev/pucrunch/
+# start address 0801h; next offset 080bh; BASIC line number is 239=00EFh; BASIC instruction is SYS 2061
+# the above combination appartly also occur for other Commodore programs like: gunzip111.c64.prg
+# and there exist PUCrunch archive for other machines like C16 with other magics
+>0 string \x01\x08\x0b\x08\xef\x00\x9e\x32\x30\x36\x31 program, probably PUCrunch archive data
+!:mime application/x-compress-pucrunch
+!:ext prg/pck
+>0 string !\x01\x08\x0b\x08\xef\x00\x9e\x32\x30\x36\x31 program
+!:mime application/x-commodore-exec
+!:ext prg/
+# start address like: 801h
+>0 uleshort !0x0801 \b, start address %#4.4x
+# 1st BASIC fragment
+>2 use basic-line
+# jump to 1 byte before next BASIC fragment; this must be zero-byte marking the end of line
+>(2.s-0x800) ubyte x
+>>&-1 ubyte !0 \b, no EOL=%#x
+# valid 2nd BASIC fragment found only in sELF-ExTRACTING-zIP executable unzp6420.prg
+>>23 search/30 \323ELF-E\330TRACTING-\332IP
+# jump again from beginning
+>>>(2.s-0x800) ubyte x
+>>>>&0 use basic-line
+# Zero-byte marking the end of the BASIC line
+>-3 ubyte !0 \b, 3 last bytes %#2.2x
+# Two zero-bytes in place of the pointer to next BASIC line indicates the end of the program
+>>-2 ubeshort x \b%4.4x
+# display information about tokenized C64 BASIC program (memory address, line number, token)
+0 name c64-prg
+>0 uleshort x Commodore C64 BASIC program
+!:mime application/x-commodore-basic
+# Tokenized BASIC programs were stored by Commodore as file type program "PRG" in separate field in directory structures.
+# So file name can have no suffix like in saveroms; When transferring to other platforms, they are often saved with .prg extensions.
+# BAS suffix is typically used for the BASIC source but also found in program pods.bas
+!:ext prg/bas/
+# start address like: 801h
+>0 uleshort !0x0801 \b, start address %#4.4x
+# 1st BASIC fragment
+>2 use basic-line
+# jump to 1 byte before next BASIC fragment; this must be zero-byte marking the end of line
+>(2.s-0x0800) ubyte x
+>>&-1 ubyte !0 \b, no EOL=%#x
+# 2nd BASIC fragment
+>>&0 use basic-line
+# zero-byte marking the end of the BASIC line
+>-3 ubyte !0 \b, 3 last bytes %#2.2x
+# Two zero-bytes in place of the pointer to next BASIC line indicates the end of the program
+>>-2 ubeshort x \b%4.4x
+# Summary: binary executable or Basic program for Commodore C128 computers
+# URL: https://en.wikipedia.org/wiki/Commodore_128
+# Reference: http://mark0.net/download/triddefs_xml.7z/defs/p/prg-c128.trid.xml
+# From: Joerg Jenderek
+# Note: Commodore 128 BASIC 7.0 variant; there exist varaints with different start addresses
+0 leshort 0x1C01
+!:strength +1
+# GRR: line above with strength 51 (50+1) is too generic because it matches SVr3 curses screen image, big-endian with strength (50) handled by ./terminfo
+# probably skip SVr3 curses images with "invalid high" second line offset
+>2 uleshort <0x1D02
+# skip foo with "invalid low" second line offset
+>>2 uleshort >0x1C06
+# if first token is not SYS this implies BASIC program
+>>>6 ubyte !0x9e
+>>>>0 use c128-prg
+# if first token is SYS this implies binary executable
+>>>6 ubyte =0x9e
+>>>>0 use c128-exe
+# Summary: binary executable or Basic program for Commodore C128 computers
+# Note: Commodore 128 BASIC 7.1 extension by Rick Simon
+# start adress 132Dh
+#0 leshort 0x132D THIS_IS_C128_7.1
+#>0 use c128-prg
+# Summary: binary executable or Basic program for Commodore C128 computers
+# Note: Commodore 128 BASIC 7.0 saved with graphics mode enabled
+# start adress 4001h
+#0 leshort 0x4001 THIS_IS_C128_GRAPHIC
+#>0 use c128-prg
+# display information about tokenized C128 BASIC program (memory address, line number, token)
+0 name c128-prg
+>0 uleshort x Commodore C128 BASIC program
+!:mime application/x-commodore-basic
+!:ext prg
+# start address like: 1C01h
+>0 uleshort !0x1C01 \b, start address %#4.4x
+# 1st BASIC fragment
+>2 use basic-line
+# jump to 1 byte before next BASIC fragment; this must be zero-byte marking the end of line
+>(2.s-0x1C00) ubyte x
+>>&-1 ubyte !0 \b, no EOL=%#x
+# 2nd BASIC fragment
+>>&0 use basic-line
+# Zero-byte marking the end of the BASIC line
+>-3 ubyte !0 \b, 3 last bytes %#2.2x
+# Two zero-bytes in place of the pointer to next BASIC line indicates the end of the program
+>>-2 ubeshort x \b%4.4x
+# display information about C128 program (memory address, line number, token)
+0 name c128-exe
+>0 uleshort x Commodore C128 program
+!:mime application/x-commodore-exec
+!:ext prg/
+# start address like: 1C01h
+>0 uleshort !0x1C01 \b, start address %#4.4x
+# 1st BASIC fragment
+>2 use basic-line
+# jump to 1 byte before next BASIC fragment; this must be zero-byte marking the end of line
+>(2.s-0x1C00) ubyte x
+>>&-1 ubyte !0 \b, no EOL=%#x
+# no valid 2nd BASIC fragment in Commodore executables
+#>>&0 use basic-line
+# Zero-byte marking the end of the BASIC line
+>-3 ubyte !0 \b, 3 last bytes %#2.2x
+# Two zero-bytes in place of the pointer to next BASIC line indicates the end of the program
+>>-2 ubeshort x \b%4.4x
+# Summary: binary executable or Basic program for Commodore C16/VIC-20/Plus4 computers
+# URL: https://en.wikipedia.org/wiki/Commodore_Plus/4
+# Reference: http://mark0.net/download/triddefs_xml.7z/defs/p/prg-vic20.trid.xml
+# defs/p/prg-plus4.trid.xml
+# From: Joerg Jenderek
+# Note: there exist VIC-20 variants with different start address
+# GRR: line below is too generic because it matches Novell LANalyzer capture
+# with regular trace header record handled by ./sniffer
+0 leshort 0x1001
+# skip regular Novell LANalyzer capture (novell-2.tr1 novell-lanalyzer.tr1 novell-win10.tr1) with "invalid low" token value 54h
+>6 ubyte >0x7F
+# skip regular Novell LANalyzer capture (novell-2.tr1 novell-lanalyzer.tr1 novell-win10.tr1) with "invalid low" second line offset 4Ch
+#>>2 uleshort >0x1006 OFFSET_NOT_TOO_LOW
+# skip foo with "invalid high" second line offset but not for 0x123b (Minefield.prg)
+#>>>2 uleshort <0x1102 OFFSET_NOT_TOO_HIGH
+# if first token is not SYS this implies BASIC program
+>>6 ubyte !0x9e
+# valid second end of line separator implies BASIC program
+>>>(2.s-0x1000) ubyte =0
+>>>>0 use c16-prg
+# invalid second end of line separator !=0 implies binary executable like: Minefield.prg
+>>>(2.s-0x1000) ubyte !0
+>>>>0 use c16-exe
+# if first token is SYS this implies binary executable
+>>6 ubyte =0x9e
+>>>0 use c16-exe
+# display information about C16 program (memory address, line number, token)
+0 name c16-exe
+>0 uleshort x Commodore C16/VIC-20/Plus4 program
+!:mime application/x-commodore-exec
+!:ext prg/
+# start address like: 1001h
+>0 uleshort !0x1001 \b, start address %#4.4x
+# 1st BASIC fragment
+>2 use basic-line
+# jump to 1 byte before next BASIC fragment; this must be zero-byte marking the end of line
+>(2.s-0x1000) ubyte x
+>>&-1 ubyte !0 \b, no EOL=%#x
+# no valid 2nd BASIC fragment in excutables
+#>>&0 use basic-line
+# Zero-byte marking the end of the BASIC line
+>-3 ubyte !0 \b, 3 last bytes %#2.2x
+# Two zero-bytes in place of the pointer to next BASIC line indicates the end of the program
+>>-2 ubeshort x \b%4.4x
+# display information about tokenized C16 BASIC program (memory address, line number, token)
+0 name c16-prg
+>0 uleshort x Commodore C16/VIC-20/Plus4 BASIC program
+!:mime application/x-commodore-basic
+!:ext prg
+# start address like: 1001h
+>0 uleshort !0x1001 \b, start address %#4.4x
+# 1st BASIC fragment
+>2 use basic-line
+# jump to 1 byte before next BASIC fragment; this must be zero-byte marking the end of line
+>(2.s-0x1000) ubyte x
+>>&-1 ubyte !0 \b, no EOL=%#x
+# 2nd BASIC fragment
+>>&0 use basic-line
+# Zero-byte marking the end of the BASIC line
+>-3 ubyte !0 \b, 3 last bytes %#2.2x
+# Two zero-bytes in place of the pointer to next BASIC line indicates the end of the program
+>>-2 ubeshort x \b%4.4x
+# Summary: binary executable or Basic program for Commodore VIC-20 computer with 8K RAM expansion
+# URL: https://en.wikipedia.org/wiki/VIC-20
+# Reference: http://mark0.net/download/triddefs_xml.7z/defs/p/prg-vic20-8k.trid.xml
+# From: Joerg Jenderek
+# Note: Basic v2.0 with Basic v4.0 extension (VIC20); there exist VIC-20 variants with different start addresses
+# start adress 1201h
+0 leshort 0x1201
+# if first token is not SYS this implies BASIC program
+>6 ubyte !0x9e
+>>0 use vic-prg
+# if first token is SYS this implies binary executable
+>6 ubyte =0x9e
+>>0 use vic-exe
+# display information about Commodore VIC-20 BASIC+8K program (memory address, line number, token)
+0 name vic-prg
+>0 uleshort x Commodore VIC-20 +8K BASIC program
+!:mime application/x-commodore-basic
+!:ext prg
+# start address like: 1201h
+>0 uleshort !0x1201 \b, start address %#4.4x
+# 1st BASIC fragment
+>2 use basic-line
+# jump to 1 byte before next BASIC fragment; this must be zero-byte marking the end of line
+>(2.s-0x1200) ubyte x
+>>&-1 ubyte !0 \b, no EOL=%#x
+# 2nd BASIC fragment
+>>&0 use basic-line
+# Zero-byte marking the end of the BASIC line
+>-3 ubyte !0 \b, 3 last bytes %#2.2x
+# Two zero-bytes in place of the pointer to next BASIC line indicates the end of the program
+>>-2 ubeshort x \b%4.4x
+# display information about Commodore VIC-20 +8K program (memory address, line number, token)
+0 name vic-exe
+>0 uleshort x Commodore VIC-20 +8K program
+!:mime application/x-commodore-exec
+!:ext prg/
+# start address like: 1201h
+>0 uleshort !0x1201 \b, start address %#4.4x
+# 1st BASIC fragment
+>2 use basic-line
+# jump to 1 byte before next BASIC fragment; this must be zero-byte marking the end of line
+>(2.s-0x0400) ubyte x
+>>&-1 ubyte !0 \b, no EOL=%#x
+# no valid 2nd BASIC fragment in excutables
+#>>&0 use basic-line
+# Zero-byte marking the end of the BASIC line
+>-3 ubyte !0 \b, 3 last bytes %#2.2x
+# Two zero-bytes in place of the pointer to next BASIC line indicates the end of the program
+>>-2 ubeshort x \b%4.4x
+# Summary: binary executable or Basic program for Commodore PET computers
+# URL: https://en.wikipedia.org/wiki/Commodore_PET
+# Reference: http://mark0.net/download/triddefs_xml.7z/defs/p/prg-pet.trid.xml
+# From: Joerg Jenderek
+# start adress 0401h
+0 leshort 0x0401
+!:strength +1
+# GRR: line above with strength 51 (50+1) is too generic because it matches TTComp archive data, ASCII, 1K dictionary
+# (strength=48=50-2) handled by ./archive and shared library (strength=50) handled by ./ibm6000
+# skip TTComp archive data, ASCII, 1K dictionary ttcomp-ascii-1k.bin with "invalid high" second line offset 4162h
+>2 uleshort <0x0502
+# skip foo with "invalid low" second line offset
+#>>2 uleshort >0x0406 OFFSET_NOT_TOO_LOW
+# skip bar with "invalid end of line"
+#>>>(2.s-0x0400) ubyte =0 END_OF_LINE_OK
+# if first token is not SYS this implies BASIC program
+>>6 ubyte !0x9e
+>>>0 use pet-prg
+# if first token is SYS this implies binary executable
+>>6 ubyte =0x9e
+>>>0 use pet-exe
+# display information about Commodore PET BASIC program (memory address, line number, token)
+0 name pet-prg
+>0 uleshort x Commodore PET BASIC program
+!:mime application/x-commodore-basic
+!:ext prg
+# start address like: 0401h
+>0 uleshort !0x0401 \b, start address %#4.4x
+# 1st BASIC fragment
+>2 use basic-line
+# jump to 1 byte before next BASIC fragment; this must be zero-byte marking the end of line
+>(2.s-0x0400) ubyte x
+# 2nd BASIC fragment
+>>&0 use basic-line
+# zero-byte marking the end of the BASIC line
+>-3 ubyte !0 \b, 3 last bytes %#2.2x
+# Two zero-bytes in place of the pointer to next BASIC line indicates the end of the program
+>>-2 ubeshort x \b%4.4x
+# display information about Commodore PET program (memory address, line number, token)
+0 name pet-exe
+>0 uleshort x Commodore PET program
+!:mime application/x-commodore-exec
+!:ext prg/
+# start address like: 0401h
+>0 uleshort !0x0401 \b, start address %#4.4x
+# 1st BASIC fragment
+>2 use basic-line
+# jump to 1 byte before next BASIC fragment; this must be zero-byte marking the end of line
+>(2.s-0x0400) ubyte x
+>>&-1 ubyte !0 \b, no EOL=%#x
+# no valid 2nd BASIC fragment in excutables
+#>>&0 use basic-line
+# Zero-byte marking the end of the BASIC line
+>-3 ubyte !0 \b, 3 last bytes %#2.2x
+# Two zero-bytes in place of the pointer to next BASIC line indicates the end of the program
+>>-2 ubeshort x \b%4.4x
+# display information about tokenized BASIC line (memory address, line number, Token)
+0 name basic-line
+# pointer to memory address of beginning of "next" BASIC line
+# greater then previous offset but maximal 100h difference
+>0 uleshort x \b, offset %#4.4x
+# BASIC line number with range from 0 to 65520; practice to increment numbers by some value (5, 10 or 100)
+>2 uleshort x \b, line %u
+# https://www.c64-wiki.com/wiki/BASIC_token
+# The "high-bit" bytes from #128-#254 stood for the various BASIC commands and mathematical operators
+>4 ubyte x \b, token (%#x)
+# https://www.c64-wiki.com/wiki/REM
+>4 string \x8f REM
+# remark string like: ** SYNTHESIZER BY RICOCHET **
+>>5 string >\0 %s
+#>>>&1 uleshort x \b, NEXT OFFSET %#4.4x
+# https://www.c64-wiki.com/wiki/PRINT
+>4 string \x99 PRINT
+# string like: "Hello world" "\021 \323ELF-E\330TRACTING-\332IP (64 ONLY)\016\231":\2362141
+>>5 string x %s
+#>>>&0 ubequad x AFTER_PRINT=%#16.16llx
+# https://www.c64-wiki.com/wiki/POKE
+>4 string \x97 POKE
+# <Memory address>,<number>
+>>5 regex \^[0-9,\040]+ %s
+# https://www.c64-wiki.com/wiki/SYS 0x9e=\236
+>4 string \x9e SYS
+# SYS <Address> parameter is a 16-bit unsigned integer; in the range 0 - 65535
+>>5 regex \^[0-9]{1,5} %s
+# maybe followed by spaces, "control-characters" or colon (:) followed by next commnds or in victracker.prg
+# (\302(43)\252256\254\302(44)\25236) /T.L.R/
+#>>5 string x SYS_STRING="%s"
+# https://www.c64-wiki.com/wiki/GOSUB
+>4 string \x8d GOSUB
+# <line>
+>>5 string >\0 %s
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.43-c64-prg.diff.sig
Type: application/octet-stream
Size: 3586 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20221112/1e120e7d/attachment-0007.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-prg.txt.gz
Type: application/x-gzip
Size: 995 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20221112/1e120e7d/attachment-0001.bin>
More information about the File
mailing list