[File] [PATCH] of Magdir/archive,console,c64 ; *.LNX; LyNX archive,Lynx cartridge

Jörg Jenderek joerg.jen.der.ek at gmx.net
Fri Jun 2 15:38:20 UTC 2023


Hello,
some days ago i want to handle some Linux kernel images. I remember that
sometimes lnx suffix is used. So i search on my system for files with
that suffix.

When running file command version 5.44 with -k option on such LNX
samples and related files, i get an output like:

Berania-install.lnx:		LyNX archive
				Commodore C64 BASIC program
				, offset 0x085b, line 10,
				token (0x97) POKE 53280,0
				, offset 0000, line 8205,
				token (0x32)
Darkon.lnx:			LyNX archive
				Commodore C64 BASIC program
				, offset 0x085b, line 10,
				token (0x97) POKE 53280,0
				, offset 0000, line 8205,
				token (0x35)
				, 3 last bytes 0x4c4c29
Hockey (NA).lnx:		Lynx cartridge, bank 0 256k,
				"hockey.lyx", "Atari"
Jimmy Conners Tennis (NA).lnx:	Lynx cartridge, bank 0 512k,
				"jconnort.lyx", "Atari"
Splat_and_Shout.lnx:		LyNX archive
				Commodore C64 BASIC program
				, offset 0x085b, line 10,
				token (0x97) POKE 53280,0
				, offset 0000, line 8205,
				token (0x31)
				, 3 last bytes 0xb1fc60
Warlords.lnx:			LyNX archive
				Commodore C64 BASIC program
				, offset 0x085b, line 10,
				token (0x97) POKE 53280,0
				, offset 0000, line 8205,
				token (0x31)
atari-lynx-chips-challenge.lnx: Lynx cartridge, bank 0 128k,
				"Atari"
helloWorld.prg:			Commodore C64 BASIC program
				, offset 0x0815, line 10,
				token (0x99) PRINT  "Hello world"
				, offset 0000, line 0, token (0)
phassine.lnx:			LyNX archive
				Commodore C64 BASIC program
				, offset 0x085b, line 10,
				token (0x97) POKE 53280,0
				, offset 0000, line 8205,
				token (0x33)
saveroms:			Commodore C64 BASIC program
				, offset 0x0828, line 10,
				token (0x8f) REM
				********************************
				, offset 0x084f, line 12,
				token (0x8f) REM
				* SAVEROMS		       *

Furthermore only generic mime type application/octet-stream or
application/x-atari-lynx-rom for Lynx cartridges are shown with -i
option. With option --extension only 3 byte sequence ??? is shown.

For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/).
None of the samples are recognized.

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html).
Samples like atari-lynx-chips-challenge.lnx are described with highest
priority as "Atari Lynx ROM" with LNX name suffix by lnx.trid.xml.
The other LNX samples are all described correctly with low priority as
"Commodore 64 BASIC V2 program" with suffix PRG by prg-c64.trid.xml.
The LNX variants are described with highest priority as "Lynx archive"
with correct suffix LNX by ark-lnx.trid.xml (See appended
trid-v-lynx.txt.gz).

TrID list the used file name extension and often with -v option the
related URL pointing to used file format information.

With the help of this tools 1 add more lines. So this is now expressed
inside Magdir/console by additional comment lines like:
# Reference:	http://mark0.net/download/triddefs_xml.7z
#		defs/l/lnx.trid.xml
This starts with lines like:
        0	string		LYNX		Lynx cartridge
        !:mime	application/x-atari-lynx-rom
Now according to TrID i show correct suffix by line like:
        !:ext	lnx
Afterwards the page size of the banks are shown. For first bank i get
values like 128, 256 or 512 and for second bank i got no values. This is
done by lines like:
        >4	leshort/4	>0		\b, bank 0 %dk
        >6	leshort/4	>0		\b, bank 1 %dk
Afterwards if available the 32 bytes cart name like "jconnort.lyx",
"viking~1.lyx", "Eye of the Beholder" or "C:\EMU\LYNX\ROMS\ULTCHESS.LYX"
are shown by line like:
        >10	string		>\0		\b, "%.32s"
Afterwards if available the 16 bytes manufacturer name like "Atari",
"NuFX Inc." or "Matthias Domin" are shown by line like:
      >42	string		>\0		\b, "%.16s"

Only 4 byte ASCII like start magic is checked. Just in case more is
needed i check for more fields in header according to source exehdr.s.
The version number seems to be always 1 and the last bytes in header are
apparently be nil. So these facts are handled by comment lines like:
      #>8	leshort		!1		\b, version number %u
      #>59	lelong		!0		\b, spare %#x

Most examples are not rotated, but example "Lexis (NA).lnx" is left
rotated and example "Centipede (Prototype).lnx" is right rotated. So i
also show this information by line like:
        >58	ubyte		>0		\b, rotation %u


With the help of TrID tools i found page about LNX (LyNX containers).
So this is now expressed inside Magdir/archive by additional comment
lines like:
# URL:		http://fileformats.archiveteam.org/wiki/Lynx_archive
# Reference:	http://ist.uwaterloo.ca/~schepers/formats/LNX.TXT
#		http://mark0.net/download/triddefs_xml.7z
#		defs/a/ark-lnx.trid.xml
According to that documentation the archives starts with a small BASIC
program which, when loaded and run, displays the message "Use LYNX to
dissolve this file". That was used by file command as test criterium.

According to documentation this BASIC program look like:
	10 POKE53280,0:
	POKE53281,0:
	POKE646,PEEK(162):
	PRINT"<CLS><DOWN><DOWN><DOWN><DOWN><DOWN><DOWN><DOWN><DOWN>":
	PRINT"     USE LYNX TO DISSOLVE THIS FILE":
	GOTO10

But in current version in output (via Magdir/c64) we get only starting
phrase "POKE 53280,0" because in sub routine basic-line interpreted
tokenized PEEK command (97h=0227) is done by lines like:
      >4		string		\x97	POKE
      >>5		regex		\^[0-9,\040]+	%s
By the above regular expression only the peek arguments <Memory
address>,<number> are greped and shown, but you can put some BASIC
commands at the same line separated by colon character (:=3Ah=0227).
So show this by additional lines afterwards like:
      >>>>&-2		ubyte		=0x3A		with
      >>>>>&0		string		x		"%s"
This can get many columns in output because whole BASIC line limit is
about 256 and is terminated by \0-character. So for the LNX archives now
get i here string like:
\22753281,0:
\227646,\302(162):
\231"\223\021\021\021\021\021\021\021\021":
\231"     USE LYNX TO DISSOLVE THIS FILE":
\21110
Now we see that this the mentioned BASIC program in tokenized form.
That means the BASIC keywords are replaced by 1 or more bytes according
to table that is specific for different BASIC variants. The TrID command
checks also for these Basic byte sequences. But according to documents
there may exist samples which contain other number of spaces for
example, but i myself found no such examples. Then the magic lines maybe
must be adapted in Magdir/archive.

Inside Magdir/c64 the samples are described as "Commodore C64 BASIC
program" by c64-prg. This interpret the first 2 BASIC fragments via sub
routine basic-line. This first displays the offset to next fragment,
then the BASIC line number and then the BASIC line content typically
starting with a token.

An offset value 0000 indicates the end of the program like in sample
helloWorld.prg. So here end of program is also end of file.
That means interpretation of bytes afterwards is not useful or gives
garbage. So normally for real BASIC programs you get in last 3 bytes of
file a nil character for BASIC line terminator and 2 bytes 0000 for
offset. This is checked and displayed by magic lines like:
     >-3		ubyte		!0	\b, 3 last bytes %#2.2x
     >>-2		ubeshort	x	\b%4.4x

So for real BASIC examples like helloWorld.prg i get after offset part
a not useful phrase like "line 0, token (0)". For LNX samples after
BASIC programs comes appended data. So in many cases the last 3 bytes
are not nil. So for sample Darkon.lnx i get 0x4c4c29 and for
Splat_and_Shout.lnx i get 0xb1fc60. Here after offset part i get
garbage/misleading phrase "line 8205, token (0x35)" or "line 8205, token
(0x31)". So i must insert an additional line that checks for not nil
offset and then continues like before. So this now becomes in sub
routine basic-line like:
0	name	basic-line
   >0		uleshort	x	\b, offset %#4.4x
   >0		uleshort	>0
   >>2		uleshort	x	\b, line %u
   >>4		ubyte		x	\b, token (%#x)
...

For LNX at this point comes the data (not line number, not token,...).
So show this in hexadecimal form. Just for interest or debugging purpose
you can also show this as string. So samples with appended data part are
now handled by additional branch which looks like:

   >0		uleshort	0
   >>2		ubeshort	x	\b, data %#4.4x
   >>4		ubeshort	x	\b%4.4x
   >>6		ubequad		x 	\b%16.16llx
   #>>3		string		x 	"%-0.30s"

When interpreting the data bytes as string i get something like:
	Berania-install.lnx:            " 2  *LYNX BY CBMCONVERT 2.0*"
	Darkon.lnx:                     " 5   LYNX IX  BY WILL CORLEY"
	Splat_and_Shout.lnx:            " 1  *LYNX XII BY WILL CORLEY"
	Warlords.lnx:                   " 1  *STAR LYNX 0.72  JOE/STA"
	phassine.lnx:                   " 3  *LYNX BY CBMCONVERT 2.0*"

Now we can check the recognition as "LyNX archive". This is done with
strength 330 inside  Magdir/archive by line like:
56 string USE\040LYNX\040TO\040DISSOLVE\040THIS\040FILE  LyNX archive
So this part of text is shown by BASIC PRINT. I am not sure that this is
always be true, because according to documentation there may exist
variants where other number of spaces are used. So this magic string may
occur at other offsets in  unusual cases, but at the moment it just keep
this test line. Afterwards show now file name suffix and a user defined
mime type instead of generic application/octet-stream. So this done by
additional lines like:
!:mime	application/x-commodore-lnx
!:ext	lnx
Because when handling dozen of such LNX samples i want some more
information in order to distinguish the LNX files. So afterwards i look
for BASIC tokenized GOTO (89h) 10, line terminator \0, end of program
tag \0\0 and Carriage Return by magic lines like:
   >86		search/10	\x8910\0\0\0\r	\b,
   #>>&0		string		x	STRING="%s"
For debugging purpose look at data bytes after BASIC program as string
as we done it by Magdir/c64. According to documentation we look for the
number of directory blocks (with values like 1 2 3 5) in ASCII with
spaces on both sides by next additional line like:
   >>&0		regex		[0-9]{1,5}	%s directory blocks
Afterwards show signature like "*LYNX XII BY WILL CORLEY" " LYNX IX BY
WILL CORLEY" "*LYNX BY CBMCONVERT 2.0*" by line like:
   >>>&2		regex		[^\r]{1,24}	\b, signature "%s"

Afterwards show number of files (with values like 2 3 6 13 69 144?=
maximum) in ASCII surrounded by spaces and delimited by Carriage Return
via line like:
   >>>>&1		regex		[0-9]{1,3}	\b, %s files

After applying the above mentioned modifications by patches
file-5.44-console-lnx.diff, file-5.44-c64-lnx.diff and
file-5.44-archive-lnx.diff then some more details are shown and now i
get an output with -k option like:

Berania-install.lnx:            LyNX archive, 2 directory blocks,
				signature "*LYNX BY CBMCONVERT 2.0*",
				6 files
				Commodore C64 BASIC program
				, offset 0x085b, line 10,
				token (0x97) POKE 53280,0
	"\22753281,0:
	\227646,\302(162):
	\231"\223\021\021\021\021\021\021\021\021":
	\231"     USE LYNX TO DISSOLVE THIS FILE":
	\21110"
	, offset 0000, data
	0x0d203220202a4c594e582042592043424d434f4e...
Darkon.lnx:                     LyNX archive, 5 directory blocks,
				signature " LYNX IX  BY WILL CORLEY",
				69 files
				Commodore C64 BASIC program
				, offset 0x085b, line 10,
				token (0x97) POKE 53280,0
	"\22753281,0:
	\227646,\302(162):
	\231"\223\021\021\021\021\021\021\021\021":
	\231"     USE LYNX TO DISSOLVE THIS FILE":
	\21110"
	, offset 0000, data
	0x0d20352020204c594e5820495820204259205749...
				, 3 last bytes 0x4c4c29
Hockey (NA).lnx:                Lynx cartridge, bank 0 256k
         				, "hockey.lyx", "Atari"
Jimmy Conners Tennis (NA).lnx:  Lynx cartridge, bank 0 512k
        	      	     		, "jconnort.lyx", "Atari"
Splat_and_Shout.lnx:            LyNX archive, 1 directory blocks,
				signature "*LYNX XII BY WILL CORLEY",
				2 files
				Commodore C64 BASIC program
				, offset 0x085b, line 10,
				token (0x97) POKE 53280,0
	"\22753281,0:
	\227646,\302(162):
	\231"\223\021\021\021\021\021\021\021\021":
	\231"     USE LYNX TO DISSOLVE THIS FILE":
	\21110"
	, offset 0000, data
	0x0d203120202a4c594e5820584949204259205749...
				, 3 last bytes 0xb1fc60
Warlords.lnx:                   LyNX archive, 1 directory blocks,
				signature "*STAR LYNX 0.72  JOE/STA",
				3 files
				Commodore C64 BASIC program
				, offset 0x085b, line 10,
				token (0x97) POKE 53280,0
	"\22753281,0:
	\227646,\302(162):
	\231"\223\021\021\021\021\021\021\021\021":
	\231"     USE LYNX TO DISSOLVE THIS FILE":
	\21110"
	, offset 0000, data
	0x0d203120202a53544152204c594e5820302e3732...
atari-lynx-chips-challenge.lnx: Lynx cartridge, bank 0 128k
				, "Atari"
helloWorld.prg:                 Commodore C64 BASIC program
				, offset 0x0815, line 10,
				token (0x99) PRINT  "Hello world"
				, offset 0000, data
	0000000000000000000000000000000000000000...
phassine.lnx:                   LyNX archive, 3 directory blocks,
				signature "*LYNX BY CBMCONVERT 2.0*",
				13 files
				Commodore C64 BASIC program
				, offset 0x085b, line 10,
				token (0x97) POKE 53280,0
	"\22753281,0:
	\227646,\302(162):
	\231"\223\021\021\021\021\021\021\021\021":
	\231"     USE LYNX TO DISSOLVE THIS FILE":
	\21110"
	, offset 0000, data
	0x0d203320202a4c594e582042592043424d434f4e...
saveroms:                       Commodore C64 BASIC program
				, offset 0x0828, line 10,
				token (0x8f) REM
				********************************
				, offset 0x084f, line 12,
				token (0x8f) REM
				* SAVEROMS                     *

I hope my diff file can be applied in future version of
file utility. Maybe "LyNX archive" in Magdir/archive and "Commodore C64
BASIC program" in Magdir/c64 should be merged. Furthermore i found
beside some Linux kernel more samples with LNX file name suffix but with
other file format.

With best wishes
Jörg Jenderek
--
Jörg Jenderek
-------------- next part --------------
--- file-5.44\magic\Magdir\console.old	Mon Dec 26 19:00:47 2022
+++ file-5.44\magic\Magdir\console	Mon May 29 21:36:56 2023
@@ -697,12 +697,25 @@
 >6	string		BS93		Lynx homebrew cartridge
 !:mime	application/x-atari-lynx-rom
 >>2	beshort		x		\b, RAM start $%04x
+# Update:	Joerg Jenderek
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/l/lnx.trid.xml
+# Note:		called "Atari Lynx ROM" by TrID
 0	string		LYNX		Lynx cartridge
 !:mime	application/x-atari-lynx-rom
+!:ext	lnx
+# bank 0 page size like: 128 256 512
 >4	leshort/4	>0		\b, bank 0 %dk
 >6	leshort/4	>0		\b, bank 1 %dk
+# 32 bytes cart name like: "jconnort.lyx" "viking~1.lyx" "Eye of the Beholder" "C:\EMU\LYNX\ROMS\ULTCHESS.LYX"
 >10	string		>\0		\b, "%.32s"
+# 16 bytes manufacturer like: "Atari" "NuFX Inc." "Matthias Domin"
 >42	string		>\0		\b, "%.16s"
+# version number
+#>8	leshort		!1		\b, version number %u
+# rotation: 1~left Lexis (NA).lnx 2~right Centipede (Prototype).lnx
+>58	ubyte		>0		\b, rotation %u
+# spare
+#>59	lelong		!0		\b, spare %#x
 
 # Opera file system that is used on the 3DO console
 # From: Serge van den Boom <svdb at stack.nl>
-------------- next part --------------
--- file-5.44\magic\Magdir\archive.old	Mon Dec 26 19:00:47 2022
+++ file-5.44\magic\Magdir\archive	Fri Jun 02 00:18:55 2023
@@ -2124,7 +2124,28 @@
 >3	byte	x	version %d
 
 # LyNX archive
+# Update:	Joerg Jenderek
+# URL:		http://fileformats.archiveteam.org/wiki/Lynx_archive
+# Reference:	http://ist.uwaterloo.ca/~schepers/formats/LNX.TXT
+#		http://mark0.net/download/triddefs_xml.7z/defs/a/ark-lnx.trid.xml
+# Note:		called "Lynx archive" by TrID and "Commodore C64 BASIC program" with "POKE 53280" by ./c64
+# TODO:		merge and unify with Commodore C64 BASIC program
 56	string	USE\040LYNX\040TO\040DISSOLVE\040THIS\040FILE	 LyNX archive
+# display "Lynx archive" (strength=330) before Commodore C64 BASIC program (strength=50) handled by ./c64
+#!:strength +0
+#!:mime	application/octet-stream
+!:mime	application/x-commodore-lnx
+!:ext	lnx
+# afterwards look for BASIC tokenized GOTO (89h) 10, line terminator \0, end of programm tag \0\0 and CarriageReturn
+>86		search/10	\x8910\0\0\0\r	\b,
+# for DEBUGGING
+#>>&0		string		x	STRING="%s"
+# number in ASCII of directory blocks with spaces on both sides like: 1 2 3 5
+>>&0		regex		[0-9]{1,5}	%s directory blocks
+# signature like: "*LYNX XII BY WILL CORLEY" " LYNX IX  BY WILL CORLEY" "*LYNX BY CBMCONVERT 2.0*"
+>>>&2		regex		[^\r]{1,24}	\b, signature "%s"
+# number of files in ASCII surrounded by spaces and delimited by CR like: 2 3 6 13 69 144 (maximum?)
+>>>>&1		regex		[0-9]{1,3}	\b, %s files
 
 # From: Joerg Jenderek
 # URL: https://www.acronis.com/
-------------- next part --------------
--- file-5.44\magic\Magdir\c64.old	Wed Nov 30 00:04:06 2022
+++ file-5.44\magic\Magdir\c64	Fri Jun 02 00:46:17 2023
@@ -203,6 +203,8 @@
 # TODO:		unify Commodore BASIC/program sub routines
 # Note:		"PUCrunch archive data" moved from ./archive and merged with c64-exe
 0	leshort		0x0801
+# display Commodore C64 BASIC program (strength=50) after "Lynx archive" (strength=330) handled by ./archive
+#!:strength +0
 # if first token is not SYS this implies BASIC program in most cases
 >6		ubyte		!0x9e
 # but sELF-ExTRACTING-zIP executable unzp6420.prg contains SYS token at end of second BASIC line (at 0x35)
@@ -499,33 +501,49 @@
 # pointer to memory address of beginning of "next" BASIC line
 # greater then previous offset but maximal 100h difference
 >0		uleshort	x	\b, offset %#4.4x
+# offset 0x0000 indicates the end of BASIC program; so bytes afterwards may be some other data
+>0		uleshort	0
+# not line number but first 2 data bytes 
+>>2		ubeshort	x	\b, data %#4.4x
+# not token but next 2 data bytes 
+>>4		ubeshort	x	\b%4.4x
+# not token arguments but next data bytes 
+>>6		ubequad		x 	\b%16.16llx
+>>14		ubequad		x 	\b%16.16llx...
+# like 0x0d20352020204c594e5820495820204259205749 "\r 5   LYNX IX  BY WILL CORLEY" for LyNX archive Darkon.lnx handled by ./archive
+#>>3		string		x 	"%-0.30s"
+>0		uleshort	>0
 # BASIC line number with range from 0 to 65520; practice to increment numbers by some value (5, 10 or 100)
->2		uleshort	x	\b, line %u
+>>2		uleshort	x	\b, line %u
 # https://www.c64-wiki.com/wiki/BASIC_token
 # The "high-bit" bytes from #128-#254 stood for the various BASIC commands and mathematical operators
->4		ubyte		x	\b, token (%#x)
+>>4		ubyte		x	\b, token (%#x)
 # https://www.c64-wiki.com/wiki/REM
->4		string		\x8f	REM
+>>4		string		\x8f	REM
 # remark string like: ** SYNTHESIZER BY RICOCHET **
->>5		string		>\0	%s
-#>>>&1		uleshort	x	\b, NEXT OFFSET %#4.4x
+>>>5		string		>\0	%s
+#>>>>&1		uleshort	x	\b, NEXT OFFSET %#4.4x
 # https://www.c64-wiki.com/wiki/PRINT
->4		string		\x99	PRINT
+>>4		string		\x99	PRINT
 # string like: "Hello world" "\021 \323ELF-E\330TRACTING-\332IP (64 ONLY)\016\231":\2362141
->>5		string		x	%s
-#>>>&0		ubequad		x	AFTER_PRINT=%#16.16llx
+>>>5		string		x	%s
+#>>>>&0		ubequad		x	AFTER_PRINT=%#16.16llx
 # https://www.c64-wiki.com/wiki/POKE
->4		string		\x97	POKE
+>>4		string		\x97	POKE
 # <Memory address>,<number>
->>5		regex		\^[0-9,\040]+	%s
+>>>5		regex		\^[0-9,\040]+	%s
+# BASIC command delimiter colon (:=3Ah)
+>>>>&-2		ubyte		=0x3A
+# after BASIC command delimiter colon remaining (<255) other tokenized BASIC commands
+>>>>>&0		string		x		"%s"
 # https://www.c64-wiki.com/wiki/SYS	0x9e=\236
->4		string		\x9e	SYS
+>>4		string		\x9e	SYS
 # SYS <Address> parameter is a 16-bit unsigned integer; in the range 0 - 65535
->>5		regex		\^[0-9]{1,5}	%s
+>>>5		regex		\^[0-9]{1,5}	%s
 # maybe followed by spaces, "control-characters" or colon (:) followed by next commnds or in victracker.prg
 # (\302(43)\252256\254\302(44)\25236) /T.L.R/
-#>>5		string		x	SYS_STRING="%s"
+#>>>5		string		x	SYS_STRING="%s"
 # https://www.c64-wiki.com/wiki/GOSUB
->4		string		\x8d	GOSUB
+>>4		string		\x8d	GOSUB
 # <line>
->>5		string		>\0	%s
+>>>5		string		>\0	%s
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.44-console-lnx.diff.sig
Type: application/octet-stream
Size: 95 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230602/d07d50d7/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.44-archive-lnx.diff.sig
Type: application/octet-stream
Size: 95 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230602/d07d50d7/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.44-c64-lnx.diff.sig
Type: application/octet-stream
Size: 95 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230602/d07d50d7/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-lnx.txt.gz
Type: application/x-gzip
Size: 770 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230602/d07d50d7/attachment.bin>


More information about the File mailing list