[File] [PATCH] Magdir/c64, archive, ibm6000, terminfo Commodore BASIC/program missed or misidentified

Jörg Jenderek joerg.jen.der.ek at gmx.net
Sat Nov 12 02:07:54 UTC 2022


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

some weeks ago ago i send patch for Novell LANalyzercapture files
with extension TR1. Unfortunately also some Commodore BASIC program
with PRG suffix were described as such captures. Often these
Commodore example have file name extension PRG.

When running file command version 5.43 with -k option on such
"compiled" Commodore Basic Programs (tokenized from pure BASIC source
like *.BAS), Commodore binary executables and related misidentified
then i get an output like:

C64 Sprite Demo.prg:  CBM BASIC, SYS 2078 COMPRESSED BY \245S
FlappyBird.prg:       CBM BASIC, SYS 2061
Mastermind.prg:       shared library
		      TTComp archive data, ASCII, 1K dictionary
Microzodiac.bas:      ASCII text
Microzodiac.prg:      Novell LANalyzer capture file
Minefield.prg:        Novell LANalyzer capture file
Monopoly.bas:         ASCII text
Monopoly.prg:         data
SVr3cursesTest.bin:   SVr3 curses screen image, big-endian
SYNTHE.PRG:           , SYS  ** SYNTHESIZER BY RICOCHET **
SYNTHE.bas:           ASCII text
Vic-tac-toe.bas:      ASCII text
Vic-tac-toe.prg:      Novell LANalyzer capture file
XLINK.PRG:            SVr3 curses screen image, big-endian
XLINK.bas:            ASCII text
breakvic_joy.bas:     ASCII text, with CRLF line terminators
breakvic_joy.prg:     Novell LANalyzer capture file
derby.log:            ASCII text, with CRLF, LF line terminators
gunzip111.c64.prg:    PUCrunch archive data
		      CBM BASIC, SYS 2061
hello-c128.prg:       SVr3 curses screen image, big-endian
hello-c64.prg:        CBM BASIC, SYS 2061
hello-pet.prg:        shared library
		      TTComp archive data, ASCII, 1K dictionary
hello.c:              C source text
helloWorld.bas:       ASCII text
helloWorld.prg:       , SYS  "Hello world"
novell-lanalyzer.tr1: Novell LANalyzer capture file
pods.bas:             , SYS \307(142):   D$\262"\035\035\035"
saveroms:             , SYS  ********************************
saveroms.bas:         ASCII text
sheridan-c16.pck:     Novell LANalyzer capture file
sheridan.bin:         CBM BASIC, SYS 2064
sheridan.pck:         PUCrunch archive data
		      CBM BASIC, SYS 2061
ttcomp-ascii-1k.bin:  shared library
		      TTComp archive data, ASCII, 1K dictionary
unzp6420.prg:         , SYS "\021\005 \325NZIP64V2.00
		      \320UBLIC \304ISTRIBUTION"
unzp6420.txt:         ASCII text
victracker.prg:       data

Furthermore with -i option only generic application/octet-stream is
shown. With option --extension only 3 byte sequence ??? is shown.

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html).

Real TTComp archive data like examples ttcomp-ascii-1k.bin, which are
described also false by file command as shared library, are described
correctly as "TTComp archive compressed (ASCII-1K)" by
ark-ttcomp-ascii-1k.trid.xml. Such examples are also described with
same rate as "Commodore PET BASIC 4.0 program" by prg-pet.trid.xml.
But this description is only correct for Commodore PET BASIC/programs
like Mastermind.prg, which are misidentified as TTComp archive.
Real Novell LANalyzer capture file like novell-lanalyzer.tr1 are
described wrong as "Commodore Plus/4 BASIC V3.5 program" by
prg-plus4.trid.xml and with same rate as "Commodore VIC-20 BASIC V2
program" by prg-vic20.trid.xml because these files start with 2 byte
sequence 0110h. But this description is only correct for Commodore
C16/VIC-20/Plus4 BASIC/program (like Microzodiac.prg Minefield.prg
Vic-tac-toe.prg breakvic_joy.prg). That this samples are not Novell
LANalyzer captures, but are Commodore BASIC programs can be see that
there exist a corresponding BASIC source file like breakvic_joy.bas
or the source can be regenerated by de-tokenize for example via VICE
emulator tool petcat by command like:
	petcat -2 -o  Vic-tac-toe.bas -- Vic-tac-toe.prg
Samples which are described as "SVr3 curses screen image" like real
SVr3cursesTest.bin or Commodore C128 BASIC/Program are described as
"Commodore 128 BASIC V7.0 program" by prg-c128.trid.xml.
The samples described by file command as "CBM BASIC" or with ", SYS"
phrase are described as "Commodore 64 BASIC V2 program" by
prg-c64.trid.xml.
Some examples like victracker.prg are described by file command only
as data. These are described as "Commodore VIC-20 BASIC V2 program
(8K RAM expansion)" by prg-vic20-8k.trid.xml (See appended
trid-v-prg.txt.gz).

So with the help of TrID output i found web pages with needed
information. That is now expressed by lines like:

# URL:		http://fileformats.archiveteam.org/
#		wiki/Commodore_BASIC_tokenized_file
# Reference:	https://www.c64-wiki.com/wiki/BASIC_token
#		https://github.com/thezerobit/bastext/blob/master/
#		bastext.doc
#		http://mark0.net/download/triddefs_xml.7z
#		defs/p/prg-c64.trid.xml

The current description happens inside Magdir/c64 by lines like:
 0	leshort		0x0801
 >2	leshort		0x080b
 >6	string		\x9e		CBM BASIC
 >7	string		>\0		\b, SYS %s

First it use only 16 bit for recognition. So the magic is too weak.
This sometimes leads to misidentifications like:
	Novell LANalyzer capture file
	TTComp archive data
	shared library
The current lines consider only commodore  C64 variant ( starting
with memory address 0x0801). So other commodore variants ( like
C16/VIC-20/Plus4 ) are missed.

Then must or should distinguish 2 variants. One are "compiled" BASIC
programs. These are just tokenized from BASIC sources ( pure text
file often with BAS extension). So these are "slower", but these are
"more human readable"and the source can be regenerated by de-tokenize
(for example by VICE emulator tool petcat). The other are binary
executables (often indicated by SYS token with value 9e). These are
"faster" in execution but source can not be regenerated so easily.
These examples are created for example by compiling via cc65 tool
from c-sources. In current magic these two cases are switched.
When comparing describing text with TrID i changed "CBM BASIC" to
"Commodore C64 BASIC program". Then for other Commodore systems i
replace the phrase "C64" by other corresponding phrase. Furthermore i
add phrase "program" to emphasize/distinguish from pure BASIC text
source ( often with BAS suffix). For the pure binary executables i
choose describing text like "Commodore C64 program".

Because magics are weak and distinguish for different Commodore
systems i put displaying parts in sub routines. Common is a
"tokenized" BASIC line. That is described by sub routine like:
 0	name	basic-line
 >0		uleshort	x	\b, offset %#4.4x
 >2		uleshort	x	\b, line %u
 >4		ubyte		x	\b, token (%#x)
 >4		string		\x8f	REM
 >>5		string		>\0	%s
 >4		string		\x99	PRINT
 >>5		string		x	%s
 >4		string		\x97	POKE
 >>5		regex		\^[0-9,\040]+	%s
 >4		string		\x9e	SYS
 >>5		regex		\^[0-9]{1,5}	%s

The displayed offset is the pointer to memory address of beginning of
next BASIC line. When we know address of current basic line (ADR),
then for worst "big" case of tokenized BASIC this value is ADR+100h,
because maximal total BASIC line length is 256 ( hexadecimal 100).
Worst "small" case is ADR+6 { 2 (offset) + 2 (line number) + 1
(minimal line content assuming 1 token) + 1 ( end of line terminator
with value 0x0)}. So we can use this as test. The Commodore
C16/VIC-20/Plus4 BASIC programs start at memory address 0x1001. That
is stored as 2 byte little endian in first bytes. Then i can use
offset to second line by tests like:
 >>2		uleshort	>0x1006	OFFSET_NOT_TOO_LOW
 >>>2		uleshort	<0x1102	OFFSET_NOT_TOO_HIGH
So by first additional test for example i can skip misidentified
regular Novell LANalyzer captures (like novell-2.tr1 novell-win10.tr1
handled by Magdir/sniffer) with "invalid low" second line offset 4Ch.
But danger when using such tests, because there are subtle traps.
These lines are only always TRUE for "tokenized" Commodore BASIC
programs. For binary executable Commodore program Minefield.prg i get
here in first line fragment offset value 0x123b.

Also the offset value 0x0000 can occur. It took me a day to understan
d
it, because it is not explicitly written. On last BASIC line this
occurs. This is marker for the BASIC interpreter that the end of
program is reached. Often this are also the last bytes of stored
tokenized BASIC program. So for control reason is show no nil values
by lines at end of sub routine like:
 >-3		ubyte		!0	\b, 3 last bytes %#2.2x
 >>-2		ubeshort	x	\b%4.4x
When in second line fragment 0 offset occurs, then this means it is
real "tokenized" BASIC program. That is very unlikely and occur only
in "artificial" examples like tutorial example helloWorld.prg. On the
other hand for binary COMMODORE executables this is often true. I
explain later why.

The shown BASIC line number is in the range from 0 to 65520, but it
is practice to increment numbers by some value (like 5, 10 or 100).
So in "well behaved" examples ( like breakvic_joy.prg) i get line
sequence 10 20 30 40 50 ... (see also source breakvic_joy.bas).
For real Commodore binary executables i get more "bad looking
sequences like 20, 7840 in FlappyBird.prg or most "bad looking" 1989,
47736 in "C64 Sprite Demo.prg". And of course in misidentified non
Commodore examples i get often here "bad" line numbers like
1281 in regular Novell LANalyzer capture (novell-lanalyzer.tr1).

After line number comes BASIC line content. I show content of first
byte as hexadecimal value. Often and especially for the first line
this is a tokenized BASIC command. The "high-bit" bytes from 128 tile
254 stood for the various BASIC commands and mathematical operators.
So i can use this feature as an additional test criterium. So i skip
regular Novell LANalyzer capture (novell-2.tr1 novell-lanalyzer.tr1
novell-win10.tr1) with "invalid low" token value 54h by line like:
 >6		ubyte		>0x7F	TOKEN_VALUE_NOT_TOO_LOW

The hexadecimal value 9e means SYS command. That tells the processor
to execute the machine language subroutine at a specific address. The
<Address> parameter is an unsigned integer, i.e. an integer in the
range 0 through 65535. I use a regular expression for catching and
displaying maximal 5 digits. If i use just a string than all until
end of line terminator is displayed. So in some examples this address
is followed by spaces, "control-characters" (which i do not
understand) or colon (:) followed by next commands. Then the output
columns would get sometimes very big and i get not informational
bargain. This is now done by fragment like:
 >4		string		\x9e	SYS
 >>5		regex		\^[0-9]{1,5}	%s
Now comes an interesting part. For this i need some days to
understand it. In the intro of the cl65 compiler suite is written tha
t
it prepends a header or stub which corresponds to a PRG-format BASIC
program, consisting of a single line, similar to this:
	20 sys2061
I verified this by creating a binary from c-source for example by
command line like:
	cl65  -v -t c64 -o hello-c64.prg  hello.c text.s
That explains why "pure" binary Commodore executable at first glance
look like a tokenized BASIC program. So i use this as criterium to
distinguish between "tokenized" BASIC program and pure binary
executable. If first token is SYS this implies binary executable, if
not then it is a BASIC program. For Basic program for Commodore
VIC-20 computer with 8K RAM expansion (start address is 1201h) this
looks like:
 0		leshort		0x1201
 >6		ubyte		!0x9e
 >>0		use		vic-prg
 >6		ubyte		=0x9e
 >>0		use		vic-exe
But things are get complicated, because this consideration is not
always true. This applies to cl65 complier suite. But obviously there
exist other compiler or handmade stubs. This took another day. The
example unzp6420.prg look at first glance like a tokenized BASIC
program. But in reality the BASIC stub contains just 2 BASIC lines
with PRINT directive. And more complicated the SYS directive just
follows, but not on a new BASIC line but after command separator
colon (:). The example is a self extracting ZIP program ( visible by
PRINT directive). So i insert for  Commodore C64 computers a "manual
exception branch". So here starting lines look like:
 0	leshort		0x0801
 >6		ubyte		!0x9e
 >>23			search/30	\323ELF-E\330TRACTING-\332IP
 >>>0				use		c64-exe
 >>23			default		x
 >>>0				use	c64-prg
 >6		ubyte		=0x9e
 >>0			use		c64-exe

For the token 8fh (that is REM directive) and 99h (that is PRINT
directive) i show explicit token name and also the following content
by string. Often this contains useful meta information like program
name, version or author like "SYNTHESIZER BY RICOCHET" in SYNTHE.PRG
or \325NZIP64V2.00 in unzp6420.prg.

So the whole subroutine for tokenized C64 BASIC program looks like:

 0	name	c64-prg
 >0		uleshort	x	Commodore C64 BASIC program
 !:mime	application/x-commodore-basic
 !:ext	prg/bas/
 >0		uleshort	!0x0801	\b, start address %#4.4x
 >2		use		basic-line
 >(2.s-0x0800)	ubyte		x
 >>&-1		ubyte		!0	\b, no EOL=%#x
 >>&0		use		basic-line
 >-3		ubyte		!0	\b, 3 last bytes %#2.2x
 >>-2		ubeshort	x	\b%4.4x
Instead of generic application/octet-stream i show a user defined
one. Tokenized BASIC programs were stored by Commodore as file type
program "PRG" in separate field in directory structures. So file
name can have no suffix like in example saveroms; When transferring
to other platforms, they are often saved with .prg extensions. The
BAS suffix is typically used for the BASIC source but also found in
program pods.bas. The BASIC lines are terminated by nil-byte. For
Control reason display unexpected case by line with phrase "no EOL".
This can only occur in binaries. Afterward the second basic line is
displayed after jump by second call of sub routine basic-line. This
part is nonsense for pure binary which are described by sub routine
like c64-exe. Another difference is that there another mime type is
used. And there i also found no BAS name suffix. The sub routines
for other machine look similar. There the difference is that
address part 080? is replaced by suited address and phrase "C64" in
describing text is also changed. Maybe it is possible unify all
these subroutines. But after spending 2 weeks for Commodore stuff i
am too tired to try to do this.

Some PRG examples like gunzip111.c64.prg are also described as
"PUCrunch archive data". The description happens inside
Magdir/archive by line like:
 0 string \x01\x08\x0b\x08\xef\x00\x9e\x32\x30\x36\x31 PUCrunch
When using updated Magdir/c64 we see it is a Commodore C64 program
(starting address 0801 hexadecimal) with next offset 080b
hexadecimal. The first BASIC line number is 239 ( 00EF hexadecimal).
First BASIC instruction is SYS 2061 ( that is SYS token 9E followed
by 4 digit characters 32 30 36 31 in hexadecimal format). I
verified that this is true by running command like:
	pucrunch sheridan.bin sheridan.pck
So i moved PUCrunch archive data" entry from ./archive and merged
it with sub routine c64-exe ( other suffix PCK instead of PRG). But
again here things are complicated. First of all i can also create
such archive for other systems with other magics by command like:
	pucrunch-c16 sheridan.bin  sheridan-c16.pck

Then there exist samples like gunzip111.c64.prg, which look like
PUCrunch archive. But when i try to unpack such samples by command
like:
	pucrunch -u gunzip111.c64.prg
i get err message like:
Not C64 short (251 > 28)
Detected C64 (19 <= 31)
Error: Broken archive, LZ copy position underrun at 828 (10438).
lzLen 3.
So i believe that SYS 2061 and line 239 combination is probably
also used by other Commodore programs. So the description as
PUCrunch archive is not reliable and the found SYS and line
combination is only a hint for PUCrunch archive.

All Commodore PET BASIC/program (like mastermind.prg with start
address 0401h) are also described as "TTComp archive data, ASCII, 1K
dictionary" with strength 48 (=50-2). The description happens inside
Magdir/archive by line like:
 0	string	\1\4
 >0	use	ttcomp
Luckily the displaying part is done by sub routine. So only suited
lines must be added before calling sub routine ttcomp. According to
jsummers for real TTComp the last 3 bytes of a file should match
one of these 8 patterns. These are non nil. For tokenized Commodore
PET BASIC when after end of line separator (that is \0) the next
offset value is 0000h) then this terminates the whole BASIC
program. In most cases these byte sequence is also the last bytes
in whole BASIC file. So this now becomes like:
 0	string	\1\4
 >-4	ubelong&0x00FFffFF	!0
 >>0	use	ttcomp

All Commodore PET BASIC/program (like mastermind.prg with start
address 0401h) are also described as "shared library" with strength
50. The description happens inside Magdir/ibm6000 by line like:
 0	beshort		0x0104		shared library
Unfortunately i have no knowledge about IBM RS/6000 machines. So i wa
s
not able to improve that above weak magic. So i only mention in
comment line that this collides with Commodore PET BASIC/programs.

All Commodore C128 BASIC 7.0/programs (like XLINK.PRG hello-c128.prg
with start address 1C01h) are also described as "SVr3 curses screen
image, big-endian" with strength 50. The description happens inside
Magdir/terminfo by line like:
 0	beshort		0434	SVr3 curses screen image, big-endian

Unfortunately i have no knowledge about terminfo and no access to big
endian machines. So i was not able to improve that above weak magic.
So i only mention in comment line that this collides with Commodore
C128 BASIC/programs. But when looking in little endian curses sample
( mentioned on scr_dump(5) man page) at offset 2 values seems to
appear "high". So by check of second offset value not too high for
C128 samples i probably skip such curses samples. For completeness i
also check for value not too low. So the starting lines for C128
BASIC 7.0 samples look like:
 0		leshort		0x1C01
 !:strength	+1
 >2		uleshort	<0x1D02
 >>2		uleshort	>0x1C06
 >>>6		ubyte		!0x9e
 >>>>0			use	c128-prg
 >>>6		ubyte		=0x9e
 >>>>0			use	c128-exe

After applying the above mentioned modifications by patches
file-5.43-c64-prg.diff file-5.43-archive-prg.diff
file-5.43-ibm6000-prg.diff file-5.43-terminfo-prg.diff
and previous file-5.43-sniffer-novell.diff then more Commodore
BASIC/programs are described (also with more details) and more
misidentification vanish. This now looks with -k option like:


C64 Sprite Demo.prg:  Commodore C64 program,
    	   	      offset 0x081c, line 1989, token (0x9e)
		      SYS 2078, 3 last bytes 0x2e4160
FlappyBird.prg:       Commodore C64 program,
		      offset 0x080b, line 20, token (0x9e)
		      SYS 2061, 3 last bytes 0x83a8ac
Mastermind.prg:       Commodore PET BASIC program,
		      offset 0x042e, line 1000, token (0x97)
		      POKE 36879,25,
		      offset 0x046e, line 1003, token (0x99)
		      PRINT "\021\021\021\021STOP IS EINDE SPEL"
		      :\231"\021HELP IS LIST SPEL"
		      :\201I\2621\2443000:\202\012-
                      shared library
Microzodiac.bas:      ASCII text
Microzodiac.prg:      Commodore C16/VIC-20/Plus4 BASIC program,
		      offset 0x100a, line 100, token (0x99)
		      PRINT "\223",
		      offset 0x1013, line 105, token (0x8d)
		      GOSUB 555
Minefield.prg:        Commodore C16/VIC-20/Plus4 program,
		      offset 0x123b, line 1, token (0x97)
		      POKE  36879,125, no EOL=0x36
Monopoly.bas:         ASCII text
Monopoly.prg:         Commodore VIC-20 +8K BASIC program,
		      offset 0x123e, line 10, token (0x8f)
		      REM  MONOPOLY BY P.WEPS 8500 NBG
		      , KILIANSTR.97 TEL.34 32 55,
		      offset 0x1256, line 20, token (0x8f)
		      REM  PROG.-ID.MONO82-3
SVr3cursesTest.bin:   SVr3 curses screen image, big-endian
SYNTHE.PRG:           Commodore C64 BASIC program,
		      offset 0x0825, line 10, token (0x8f)
		      REM  ** SYNTHESIZER BY RICOCHET **,
		      offset 0x083d, line 20, token (0x97)
		      POKE 53280,15, 3 last bytes 0x202020
SYNTHE.bas:           ASCII text
Vic-tac-toe.bas:      ASCII text
Vic-tac-toe.prg:      Commodore C16/VIC-20/Plus4 BASIC program,
		      offset 0x1024, line 0, token (0x8f)
		      REM "\024\024\024\024\024\024
		      *** BY CRAIG BRUCE ***,
		      offset 0x106a, line 1, token (0x4d)
XLINK.PRG:            Commodore C128 BASIC program,
		      offset 0x1c3f, line 10, token (0x8b),
		      offset 0x1c78, line 20, token (0xde)
                      SVr3 curses screen image, big-endian
XLINK.bas:            ASCII text
breakvic_joy.bas:     ASCII text, with CRLF line terminators
breakvic_joy.prg:     Commodore C16/VIC-20/Plus4 BASIC program,
		      offset 0x104e, line 10, token (0x97)
		      POKE 36878,15,
		      offset 0x1091, line 20, token (0x54)
gunzip111.c64.prg:    Commodore C64 program
		      , probably PUCrunch archive data,
		      offset 0x080b, line 239, token (0x9e)
		      SYS 2061, 3 last bytes 0x829ff8
hello-c128.prg:       Commodore C128 program,
		      offset 0x1c0b, line 531, token (0x9e)
		      SYS 7181
                      SVr3 curses screen image, big-endian
hello-c64.prg:        Commodore C64 program,
		      offset 0x080b, line 531, token (0x9e)
		      SYS 2061, 3 last bytes 0xff5d12
hello-pet.prg:        Commodore PET program,
		      offset 0x040b, line 531, token (0x9e)
		      SYS 1037
                      shared library
hello.c:              ASCII text
helloWorld.bas:       ASCII text
helloWorld.prg:       Commodore C64 BASIC program,
		      offset 0x0815, line 10, token (0x99)
		      PRINT  "Hello world",
		      offset 0000, line 0, token (0)
novell-lanalyzer.tr1: Novell LANalyzer capture file, version 1.5,
		      record length 0x4c, 2nd record length 0x80,
		      names Channel1 Channel2 ...
pods.bas:             Commodore C64 BASIC program,
		      offset 0x0819, line 4, token (0x99)
		      PRINT \307(142):   D$\262"\035\035\035",
		      offset 0x0825, line 5, token (0x81)
saveroms:             Commodore C64 BASIC program,
		      offset 0x0828, line 10, token (0x8f)
		      REM  ********************************,
		      offset 0x084f, line 12, token (0x8f)
		      REM  * SAVEROMS                     *
saveroms.bas:         ASCII text
sheridan-c16.pck:     Commodore C16/VIC-20/Plus4 program,
		      offset 0x100b, line 239, token (0x9e)
		      SYS 4109, 3 last bytes 0xb3fff0
sheridan.bin:         Commodore C64 program,
		      offset 0x080b, line 0, token (0x9e)
		      SYS 2064, 3 last bytes 0xf9f9fa
sheridan.pck:         Commodore C64 program
		      , probably PUCrunch archive data,
		      offset 0x080b, line 239, token (0x9e)
		      SYS 2061, 3 last bytes 0xb3fff0
ttcomp-ascii-1k.bin:  shared library
                      TTComp archive data dictionary
unzp6420.prg:         Commodore C64 program,
		      offset 0x082c, line 1998, token (0x99)
		      PRINT "\021\005 \325NZIP64V2.00
		      \320UBLIC \304ISTRIBUTION",
		      offset 0x085b, line 1999, token (0x99)
		      PRINT "\021 \323ELF-E\330TRACTING-\332IP
		      (64 ONLY)\016\231":\2362141
unzp6420.txt:         ASCII text
victracker.prg:       Commodore VIC-20 +8K program,
		      offset 0x1223, line 2004, token (0x9e)
		      SYS, no EOL=0x50, 3 last bytes 0x201102

I hope my diff files can be applied in future version of file
utility.

Unfortunately there exist some more Commodore BASIC variants, but i
myself found no examples. So i add for such systems magic lines
only as commented fragments. Maybe also other BASIC samples not for
Commodore are described by my magics.

With best wishes,
Jörg Jenderek
- --
Jörg Jenderek
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCY27/+QAKCRCv8rHJQhrU
1iP2AKCUYEI3SKwcrG3dJKtDebEDwKA6awCgwboLaFSvRBKos+Y+a67EE4JdIxQ=
=sJZ5
-----END PGP SIGNATURE-----
-------------- next part --------------
--- file-5.43/magic/Magdir/ibm6000.old	2021-07-05 11:33:09.000000000 +0200
+++ file-5.43/magic/Magdir/ibm6000	2022-11-09 00:11:21.522922700 +0100
@@ -3,4 +3,5 @@
 # $File: ibm6000,v 1.15 2021/07/03 14:01:46 christos Exp $
 # ibm6000:  file(1) magic for RS/6000 and the RT PC.
+# https://en.wikipedia.org/wiki/IBM_RS/6000
 #
 0	beshort		0x01df		executable (RISC System/6000 V3.1) or obj module
@@ -12,4 +13,5 @@
 #>6	beshort		>0		- version %ld
 # GRR: line below is too general as it matches also TTComp archive, ASCII, 1K handled by ./archive
+# and Commodore PET BASIC program (Mastermind.prg with start address 401h) and strength (51=50+1)
 0	beshort		0x0104		shared library
 # GRR: line below is too general as it matches also TTComp archive, ASCII, 2K handled by ./archive
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.43-ibm6000-prg.diff.sig
Type: application/octet-stream
Size: 626 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20221112/1e120e7d/attachment-0004.obj>
-------------- next part --------------
--- file-5.43/magic/Magdir/archive.old	2022-09-13 20:05:39.000000000 +0200
+++ file-5.43/magic/Magdir/archive	2022-11-09 00:18:39.096366400 +0100
@@ -507,9 +507,10 @@
 0	string	\1\4
 # TODO:
-# skip Commodore PET BASIC 4.0 program *.prg
-# variant ASCII, 1K dictionary (strength=48=50-2). With strength=49 wrong order! WHY?
 # skip shared library (strength=50) handled by ./ibm6000
 !:strength	-2
->0	use	ttcomp
+# skip Commodore PET BASIC programs (Mastermind.prg) with last 3 nil bytes (\0~end of line followed by 0000h line offset)
+#>-4	ubelong		x	LAST_BYTES=%8.8x
+>-4	ubelong&0x00FFffFF	!0
+>>0	use	ttcomp
 #	display information of TTComp archive
 0	name	ttcomp
@@ -792,6 +793,4 @@
 # Terse
 0	string	\5\1\1\0 Terse archive data
-# PUCrunch
-0	string	\x01\x08\x0b\x08\xef\x00\x9e\x32\x30\x36\x31 PUCrunch archive data
 # UHarc
 0	string	UHA UHarc archive data
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.43-archive-prg.diff.sig
Type: application/octet-stream
Size: 694 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20221112/1e120e7d/attachment-0005.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.43-terminfo-prg.diff.sig
Type: application/octet-stream
Size: 507 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20221112/1e120e7d/attachment-0006.obj>
-------------- next part --------------
--- file-5.43/magic/Magdir/terminfo.old	2021-02-23 01:51:10.000000000 +0100
+++ file-5.43/magic/Magdir/terminfo	2022-11-10 16:41:06.298555800 +0100
@@ -37,6 +37,7 @@
 # AIX and HPUX use the SVr4 big-endian format
 # Solaris uses the SVr3 formats (sparc and x86 differ endian-ness)
 0	beshort		0433 		SVr2 curses screen image, big-endian
+# GRR: line below too general as it catches Commodore C128 program (crc32.prg XLINK.PRG) with start address 1C01h handled by ./c64
 0	beshort		0434		SVr3 curses screen image, big-endian
 0	beshort		0435		SVr4 curses screen image, big-endian
 #
-------------- next part --------------
--- file-5.43/magic/Magdir/c64.old	2022-05-14 22:03:39.000000000 +0200
+++ file-5.43/magic/Magdir/c64	2022-11-12 02:44:28.599089600 +0100
@@ -194,7 +194,338 @@
 >100	byte		>0		\b, %u subsong(s)
 
 # CBM BASIC (cc65 compiled)
+# Summary:	binary executable or Basic program for Commodore C64 computers
+# Update:	Joerg Jenderek
+# URL:		http://fileformats.archiveteam.org/wiki/Commodore_BASIC_tokenized_file
+# Reference:	https://www.c64-wiki.com/wiki/BASIC_token
+#		https://github.com/thezerobit/bastext/blob/master/bastext.doc
+#		http://mark0.net/download/triddefs_xml.7z/defs/p/prg-c64.trid.xml
+# TODO:		unify Commodore BASIC/program sub routines
+# Note:		"PUCrunch archive data" moved from ./archive and merged with c64-exe
 0	leshort		0x0801
->2	leshort		0x080b
->6	string		\x9e		CBM BASIC
->7	string		>\0		\b, SYS %s
+# if first token is not SYS this implies BASIC program in most cases
+>6		ubyte		!0x9e
+# but sELF-ExTRACTING-zIP executable unzp6420.prg contains SYS token at end of second BASIC line (at 0x35)
+>>23		search/30	\323ELF-E\330TRACTING-\332IP
+>>>0		use		c64-exe
+>>23		default		x
+>>>0		use		c64-prg
+# if first token is SYS this implies binary executable
+>6		ubyte		=0x9e
+>>0		use		c64-exe
+# display information about C64 binary executable (memory address, line number, token)
+0	name	c64-exe
+>0		uleshort	x	Commodore C64
+# http://a1bert.kapsi.fi/Dev/pucrunch/
+# start address 0801h; next offset 080bh; BASIC line number is 239=00EFh; BASIC instruction is SYS 2061
+# the above combination appartly also occur for other Commodore programs like: gunzip111.c64.prg
+# and there exist PUCrunch archive for other machines like C16 with other magics
+>0		string	\x01\x08\x0b\x08\xef\x00\x9e\x32\x30\x36\x31	program, probably PUCrunch archive data
+!:mime	application/x-compress-pucrunch
+!:ext	prg/pck
+>0		string	!\x01\x08\x0b\x08\xef\x00\x9e\x32\x30\x36\x31	program
+!:mime	application/x-commodore-exec
+!:ext	prg/
+# start address like: 801h
+>0		uleshort	!0x0801	\b, start address %#4.4x
+# 1st BASIC fragment
+>2		use		basic-line
+# jump to 1 byte before next BASIC fragment; this must be zero-byte marking the end of line
+>(2.s-0x800)	ubyte		x
+>>&-1		ubyte		!0	\b, no EOL=%#x
+# valid 2nd BASIC fragment found only in sELF-ExTRACTING-zIP executable unzp6420.prg
+>>23		search/30	\323ELF-E\330TRACTING-\332IP
+# jump again from beginning
+>>>(2.s-0x800)	ubyte		x
+>>>>&0		use		basic-line
+# Zero-byte marking the end of the BASIC line
+>-3		ubyte		!0	\b, 3 last bytes %#2.2x
+# Two zero-bytes in place of the pointer to next BASIC line indicates the end of the program
+>>-2		ubeshort	x	\b%4.4x
+# display information about tokenized C64 BASIC program (memory address, line number, token)
+0	name	c64-prg
+>0		uleshort	x	Commodore C64 BASIC program
+!:mime	application/x-commodore-basic
+# Tokenized BASIC programs were stored by Commodore as file type program "PRG" in separate field in directory structures.
+# So file name can have no suffix like in saveroms; When transferring to other platforms, they are often saved with .prg extensions.
+# BAS suffix is typically used for the BASIC source but also found in program pods.bas
+!:ext	prg/bas/
+# start address like: 801h
+>0		uleshort	!0x0801	\b, start address %#4.4x
+# 1st BASIC fragment
+>2		use		basic-line
+# jump to 1 byte before next BASIC fragment; this must be zero-byte marking the end of line
+>(2.s-0x0800)	ubyte		x	
+>>&-1		ubyte		!0	\b, no EOL=%#x
+# 2nd BASIC fragment
+>>&0		use		basic-line
+# zero-byte marking the end of the BASIC line
+>-3		ubyte		!0	\b, 3 last bytes %#2.2x
+# Two zero-bytes in place of the pointer to next BASIC line indicates the end of the program
+>>-2		ubeshort	x	\b%4.4x
+# Summary:	binary executable or Basic program for Commodore C128 computers
+# URL:		https://en.wikipedia.org/wiki/Commodore_128
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/p/prg-c128.trid.xml
+# From:		Joerg Jenderek
+# Note:		Commodore 128 BASIC 7.0 variant; there exist varaints with different start addresses
+0		leshort		0x1C01
+!:strength	+1
+# GRR: line above with strength 51 (50+1) is too generic because it matches SVr3 curses screen image, big-endian with strength (50) handled by ./terminfo
+# probably skip SVr3 curses images with "invalid high" second line offset 
+>2		uleshort	<0x1D02
+# skip foo with "invalid low" second line offset
+>>2		uleshort	>0x1C06
+# if first token is not SYS this implies BASIC program
+>>>6		ubyte		!0x9e
+>>>>0			use	c128-prg
+# if first token is SYS this implies binary executable
+>>>6		ubyte		=0x9e
+>>>>0		use		c128-exe
+# Summary:	binary executable or Basic program for Commodore C128 computers
+# Note:		Commodore 128 BASIC 7.1 extension by Rick Simon
+# start adress 132Dh
+#0		leshort		0x132D	THIS_IS_C128_7.1
+#>0			use	c128-prg
+# Summary:	binary executable or Basic program for Commodore C128 computers
+# Note:		Commodore 128 BASIC 7.0 saved with graphics mode enabled
+# start adress 4001h
+#0		leshort		0x4001	THIS_IS_C128_GRAPHIC
+#>0			use	c128-prg
+# display information about tokenized C128 BASIC program (memory address, line number, token)
+0	name	c128-prg
+>0		uleshort	x	Commodore C128 BASIC program
+!:mime	application/x-commodore-basic
+!:ext	prg
+# start address like: 1C01h
+>0		uleshort	!0x1C01	\b, start address %#4.4x
+# 1st BASIC fragment
+>2		use		basic-line
+# jump to 1 byte before next BASIC fragment; this must be zero-byte marking the end of line
+>(2.s-0x1C00)	ubyte		x
+>>&-1		ubyte		!0	\b, no EOL=%#x
+# 2nd BASIC fragment
+>>&0		use		basic-line
+# Zero-byte marking the end of the BASIC line
+>-3		ubyte		!0	\b, 3 last bytes %#2.2x
+# Two zero-bytes in place of the pointer to next BASIC line indicates the end of the program
+>>-2		ubeshort	x	\b%4.4x
+# display information about C128 program (memory address, line number, token)
+0	name	c128-exe
+>0		uleshort	x	Commodore C128 program
+!:mime	application/x-commodore-exec
+!:ext	prg/
+# start address like: 1C01h
+>0		uleshort	!0x1C01	\b, start address %#4.4x
+# 1st BASIC fragment
+>2		use		basic-line
+# jump to 1 byte before next BASIC fragment; this must be zero-byte marking the end of line
+>(2.s-0x1C00)	ubyte		x
+>>&-1		ubyte		!0	\b, no EOL=%#x
+# no valid 2nd BASIC fragment in Commodore executables
+#>>&0		use		basic-line
+# Zero-byte marking the end of the BASIC line
+>-3		ubyte		!0	\b, 3 last bytes %#2.2x
+# Two zero-bytes in place of the pointer to next BASIC line indicates the end of the program
+>>-2		ubeshort	x	\b%4.4x
+# Summary:	binary executable or Basic program for Commodore C16/VIC-20/Plus4 computers
+# URL:		https://en.wikipedia.org/wiki/Commodore_Plus/4
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/p/prg-vic20.trid.xml
+#		defs/p/prg-plus4.trid.xml
+# From:		Joerg Jenderek
+# Note:		there exist VIC-20 variants with different start address
+# GRR: line below is too generic because it matches Novell LANalyzer capture
+# with regular trace header record handled by ./sniffer
+0		leshort		0x1001
+# skip regular Novell LANalyzer capture (novell-2.tr1 novell-lanalyzer.tr1 novell-win10.tr1) with "invalid low" token value 54h
+>6		ubyte		>0x7F
+# skip regular Novell LANalyzer capture (novell-2.tr1 novell-lanalyzer.tr1 novell-win10.tr1) with "invalid low" second line offset 4Ch
+#>>2		uleshort	>0x1006	OFFSET_NOT_TOO_LOW
+# skip foo with "invalid high" second line offset but not for 0x123b (Minefield.prg)
+#>>>2		uleshort	<0x1102	OFFSET_NOT_TOO_HIGH
+# if first token is not SYS this implies BASIC program
+>>6		ubyte		!0x9e
+# valid second end of line separator implies BASIC program
+>>>(2.s-0x1000)		ubyte	=0
+>>>>0			use	c16-prg
+# invalid second end of line separator !=0 implies binary executable like: Minefield.prg
+>>>(2.s-0x1000)		ubyte	!0
+>>>>0			use	c16-exe
+# if first token is SYS this implies binary executable
+>>6		ubyte		=0x9e
+>>>0		use		c16-exe
+# display information about C16 program (memory address, line number, token)
+0	name	c16-exe
+>0		uleshort	x	Commodore C16/VIC-20/Plus4 program
+!:mime	application/x-commodore-exec
+!:ext	prg/
+# start address like: 1001h
+>0		uleshort	!0x1001	\b, start address %#4.4x
+# 1st BASIC fragment
+>2		use		basic-line
+# jump to 1 byte before next BASIC fragment; this must be zero-byte marking the end of line
+>(2.s-0x1000)	ubyte		x
+>>&-1		ubyte		!0	\b, no EOL=%#x
+# no valid 2nd BASIC fragment in excutables
+#>>&0		use		basic-line
+# Zero-byte marking the end of the BASIC line
+>-3		ubyte		!0	\b, 3 last bytes %#2.2x
+# Two zero-bytes in place of the pointer to next BASIC line indicates the end of the program
+>>-2		ubeshort	x	\b%4.4x
+# display information about tokenized C16 BASIC program (memory address, line number, token)
+0	name	c16-prg
+>0		uleshort	x	Commodore C16/VIC-20/Plus4 BASIC program
+!:mime	application/x-commodore-basic
+!:ext	prg
+# start address like: 1001h
+>0		uleshort	!0x1001	\b, start address %#4.4x
+# 1st BASIC fragment
+>2		use		basic-line
+# jump to 1 byte before next BASIC fragment; this must be zero-byte marking the end of line
+>(2.s-0x1000)	ubyte		x
+>>&-1		ubyte		!0	\b, no EOL=%#x
+# 2nd BASIC fragment
+>>&0		use		basic-line
+# Zero-byte marking the end of the BASIC line
+>-3		ubyte		!0	\b, 3 last bytes %#2.2x
+# Two zero-bytes in place of the pointer to next BASIC line indicates the end of the program
+>>-2		ubeshort	x	\b%4.4x
+# Summary:	binary executable or Basic program for Commodore VIC-20 computer with 8K RAM expansion
+# URL:		https://en.wikipedia.org/wiki/VIC-20
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/p/prg-vic20-8k.trid.xml
+# From:		Joerg Jenderek
+# Note:		Basic v2.0 with Basic v4.0 extension (VIC20); there exist VIC-20 variants with different start addresses
+# start adress 1201h
+0		leshort		0x1201
+# if first token is not SYS this implies BASIC program
+>6		ubyte		!0x9e
+>>0		use		vic-prg
+# if first token is SYS this implies binary executable
+>6		ubyte		=0x9e
+>>0		use		vic-exe
+# display information about Commodore VIC-20 BASIC+8K program (memory address, line number, token)
+0	name	vic-prg
+>0		uleshort	x	Commodore VIC-20 +8K BASIC program
+!:mime	application/x-commodore-basic
+!:ext	prg
+# start address like: 1201h
+>0		uleshort	!0x1201	\b, start address %#4.4x
+# 1st BASIC fragment
+>2		use		basic-line
+# jump to 1 byte before next BASIC fragment; this must be zero-byte marking the end of line
+>(2.s-0x1200)	ubyte		x
+>>&-1		ubyte		!0	\b, no EOL=%#x
+# 2nd BASIC fragment
+>>&0		use		basic-line
+# Zero-byte marking the end of the BASIC line
+>-3		ubyte		!0	\b, 3 last bytes %#2.2x
+# Two zero-bytes in place of the pointer to next BASIC line indicates the end of the program
+>>-2		ubeshort	x	\b%4.4x
+# display information about Commodore VIC-20 +8K program (memory address, line number, token)
+0	name	vic-exe
+>0		uleshort	x	Commodore VIC-20 +8K program
+!:mime	application/x-commodore-exec
+!:ext	prg/
+# start address like: 1201h
+>0		uleshort	!0x1201	\b, start address %#4.4x
+# 1st BASIC fragment
+>2		use		basic-line
+# jump to 1 byte before next BASIC fragment; this must be zero-byte marking the end of line
+>(2.s-0x0400)	ubyte		x
+>>&-1		ubyte		!0	\b, no EOL=%#x
+# no valid 2nd BASIC fragment in excutables
+#>>&0		use		basic-line
+# Zero-byte marking the end of the BASIC line
+>-3		ubyte		!0	\b, 3 last bytes %#2.2x
+# Two zero-bytes in place of the pointer to next BASIC line indicates the end of the program
+>>-2		ubeshort	x	\b%4.4x
+# Summary:	binary executable or Basic program for Commodore PET computers
+# URL:		https://en.wikipedia.org/wiki/Commodore_PET
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/p/prg-pet.trid.xml
+# From:		Joerg Jenderek
+# start adress 0401h
+0		leshort		0x0401
+!:strength	+1
+# GRR: line above with strength 51 (50+1) is too generic because it matches TTComp archive data, ASCII, 1K dictionary
+# (strength=48=50-2) handled by ./archive and shared library (strength=50) handled by ./ibm6000
+# skip TTComp archive data, ASCII, 1K dictionary ttcomp-ascii-1k.bin with "invalid high" second line offset 4162h
+>2		uleshort	<0x0502
+# skip foo with "invalid low" second line offset
+#>>2		uleshort	>0x0406	OFFSET_NOT_TOO_LOW
+# skip bar with "invalid end of line" 
+#>>>(2.s-0x0400)	ubyte		=0	END_OF_LINE_OK
+# if first token is not SYS this implies BASIC program
+>>6		ubyte		!0x9e
+>>>0		use		pet-prg
+# if first token is SYS this implies binary executable
+>>6		ubyte		=0x9e
+>>>0		use		pet-exe
+# display information about Commodore PET BASIC program (memory address, line number, token)
+0	name	pet-prg
+>0		uleshort	x	Commodore PET BASIC program
+!:mime	application/x-commodore-basic
+!:ext	prg
+# start address like: 0401h
+>0		uleshort	!0x0401	\b, start address %#4.4x
+# 1st BASIC fragment
+>2		use		basic-line
+# jump to 1 byte before next BASIC fragment; this must be zero-byte marking the end of line
+>(2.s-0x0400)	ubyte		x
+# 2nd BASIC fragment
+>>&0		use		basic-line
+# zero-byte marking the end of the BASIC line
+>-3		ubyte		!0	\b, 3 last bytes %#2.2x
+# Two zero-bytes in place of the pointer to next BASIC line indicates the end of the program
+>>-2		ubeshort	x	\b%4.4x
+# display information about Commodore PET program (memory address, line number, token)
+0	name	pet-exe
+>0		uleshort	x	Commodore PET program
+!:mime	application/x-commodore-exec
+!:ext	prg/
+# start address like: 0401h
+>0		uleshort	!0x0401	\b, start address %#4.4x
+# 1st BASIC fragment
+>2		use		basic-line
+# jump to 1 byte before next BASIC fragment; this must be zero-byte marking the end of line
+>(2.s-0x0400)	ubyte		x
+>>&-1		ubyte		!0	\b, no EOL=%#x
+# no valid 2nd BASIC fragment in excutables
+#>>&0		use		basic-line
+# Zero-byte marking the end of the BASIC line
+>-3		ubyte		!0	\b, 3 last bytes %#2.2x
+# Two zero-bytes in place of the pointer to next BASIC line indicates the end of the program
+>>-2		ubeshort	x	\b%4.4x
+# display information about tokenized BASIC line (memory address, line number, Token)
+0	name	basic-line
+# pointer to memory address of beginning of "next" BASIC line
+# greater then previous offset but maximal 100h difference
+>0		uleshort	x	\b, offset %#4.4x
+# BASIC line number with range from 0 to 65520; practice to increment numbers by some value (5, 10 or 100)
+>2		uleshort	x	\b, line %u
+# https://www.c64-wiki.com/wiki/BASIC_token
+# The "high-bit" bytes from #128-#254 stood for the various BASIC commands and mathematical operators
+>4		ubyte		x	\b, token (%#x)
+# https://www.c64-wiki.com/wiki/REM
+>4		string		\x8f	REM
+# remark string like: ** SYNTHESIZER BY RICOCHET **
+>>5		string		>\0	%s
+#>>>&1		uleshort	x	\b, NEXT OFFSET %#4.4x
+# https://www.c64-wiki.com/wiki/PRINT
+>4		string		\x99	PRINT
+# string like: "Hello world" "\021 \323ELF-E\330TRACTING-\332IP (64 ONLY)\016\231":\2362141
+>>5		string		x	%s
+#>>>&0		ubequad		x	AFTER_PRINT=%#16.16llx
+# https://www.c64-wiki.com/wiki/POKE
+>4		string		\x97	POKE
+# <Memory address>,<number>
+>>5		regex		\^[0-9,\040]+	%s
+# https://www.c64-wiki.com/wiki/SYS	0x9e=\236
+>4		string		\x9e	SYS
+# SYS <Address> parameter is a 16-bit unsigned integer; in the range 0 - 65535
+>>5		regex		\^[0-9]{1,5}	%s
+# maybe followed by spaces, "control-characters" or colon (:) followed by next commnds or in victracker.prg
+# (\302(43)\252256\254\302(44)\25236) /T.L.R/
+#>>5		string		x	SYS_STRING="%s"
+# https://www.c64-wiki.com/wiki/GOSUB
+>4		string		\x8d	GOSUB
+# <line>
+>>5		string		>\0	%s
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.43-c64-prg.diff.sig
Type: application/octet-stream
Size: 3586 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20221112/1e120e7d/attachment-0007.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-prg.txt.gz
Type: application/x-gzip
Size: 995 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20221112/1e120e7d/attachment-0001.bin>


More information about the File mailing list