[File] [PATCH] Magdir/archive for ARJ, JAR (ARJ Software, Inc.) versus Java archive data (JAR)

Christos Zoulas christos at zoulas.com
Sat Mar 12 19:01:41 UTC 2022


Sorry, that does not seem to apply cleanly. Can you diff agains HEAD?

christos

> On Mar 12, 2022, at 1:40 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hello,
> 
> some days ago i want to handle some Java archive which are ZIP
> compressed based and have normally 3 byte jar name extension or
> maybe 1 byte j extension.
> 
> Unfortunately these extensions are also used by other compression
> tools. When running file command version 5.41 on such non-ZIP
> examples extensions i get a nearly correct output like:
> 19GXE.ARJ:         ARJ archive data, v3
> 		   , original name: #9GXE.ARJ
> 		   , os: MS-DOS
> MY_JARC.JAR:       JAR (ARJ Software, Inc.) archive data
> SAMPLE.J:          JAR (ARJ Software, Inc.) archive data
> TEST-hk2.ARJ:      ARJ archive data, v11
> 		   , slash-switched
> 		   , original name: \003,
> WP60.ARJ:          ARJ archive data, v4
> 		   , slash-switched
> 		   , original name: WP60.ARJ
> 		   , os: MS-DOS 4]
> pmext4pc.arj:      ARJ archive data, v6
> 		   , slash-switched
> 		   , original name: PMEXT4PC.ARJ
> 		   , os: MS-DOS 3]
> test-je-v360K.e01: ARJ archive data, v11
> 		   , slash-switched
> 		   , original name: ,
> test-r-v360.a02:   ARJ archive data, v11
> 		   , multi-volume
> 		   , slash-switched
> 		   , original name: ,
> zip300.j:          JAR (ARJ Software, Inc.) archive data
> zip300.j01:        JAR (ARJ Software, Inc.) archive data
> 
> 
> With --extension option only ??? is displayed. Furthermore with -i
> option only for ARJ samples application/x-arj is shown. For other
> examples only generic application/octet-stream is shown.
> 
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). This describes
> samples with JAR extension like MY_JARC.JAR as "JARCS compressed
> archive" by ark-jarcs.trid.xml. Most of the others are described as
> "JAR compressed archive" by ark-jar.trid.xml or as variant with
> additional "(with Security Envelope)" by ark-jar-se.trid.xml (See
> appended jar_j_trid-v.txt.gz)
> 
> Luckily with -v option TrID displays file name extension and related
> URL. With the information of this tools i found a page about about
> JAR (ARJ Software) on file formats archive team web site. That
> information is expressed by comment lines inside Magdir/archive like:
> # URL:	http://fileformats.archiveteam.org/wiki/JAR_(ARJ_Software)
> # ref.:	http://mark0.net/download/triddefs_xml.7z
> #	defs/a/ark-jar.trid.xml
> 
> The description happens inside Magdir/archive line like:
> 0xe	string	\x1aJar\x1b JAR (ARJ Software, Inc.) archive data
> This now becomes like:
> 0xe	string	\x1aJar\x1b JAR (ARJ Software, Inc.) archive data
> !:mime	application/x-compress-j
>> 0	ulelong	x		\b, CRC32 %#x
> !:ext	j/j01/j02
> Instead generic mime type application/octet-stream i display an used
> defined one. The standard suffix is ".j", but if you create multi
> volumes then the first get this suffix but the following have suffix
> with number order like:
> j01 j02 ... j99 100 ... 990
> 
> For the example with only one j character in suffix the description
> happens inside Magdir/archive line like:
> 0	string	JARCS JAR (ARJ Software, Inc.) archive data
> So we get same description as in the other examples, but there at
> the beginning the CRC is stored whereas here we find text string
> JARCS. So magic lines now becomes like:
> 0	string	JARCS JAR (ARJ Software, Inc.) archive data
> !:mime	application/x-compress-jar
> !:ext	jar
> Instead generic mime type application/octet-stream i display another
> used defined one. The standard suffix is ".jar", that is also  used
> for Java archive. The information about that format are expressed
> inside Magdir/archive by lines like:
> # URL:		http://fileformats.archiveteam.org/wiki/JARCS
> # reference:	http://mark0.net/download/triddefs_xml.7z
> #		a/ark-jarcs.trid.xml
> 
> Because the "jar" are described as based on ARJ i also check such
> examples. These are described by TrID as "ARJ compressed archive" by
> ark-arj.trid.xml and as "ARJ File Format" by DROID via PUID fmt/610.
> There is also mentioned that the specification about ARJ can be found
> in file with name TECHNOTE.TXT, that can be found in unarj or
> multiarc sources tree. That information is expressed by comment
> lines inside Magdir/archive like:
> # URL:		http://fileformats.archiveteam.org/wiki/ARJ
> # reference:	http://mark0.net/download/triddefs_xml.7z
> #		defs/a/ark-arj.trid.xml
> #		https://github.com/FarGroup/FarManager/
> #		blob/master/plugins/multiarc/arc.doc/arj.txt
> 
> Often information about the operating system is shown. That is done
> by lines like:
>> 7	byte		0		os: MS-DOS 7	byte		1		os: PRIMOS
> ...
>> 7	byte		9		os: VAX/VMS
> But sometimes no such information is shown, because for "new"
> systems higher numbers are used. So according to newer
> specification this is now done by additional lines like:
>> 7	byte		10		os: WIN95 7	byte		11		os: WIN32
> 
> Afterwards often a digit and bracket like for example WP60.ARJ is
> shown. That was done by line like:
>> 3	byte		>0		%d]
> 
> But according to specification this is the basic header size (like:
> 0x002b 0x002c 0x04e0 0x04e3 0x04e7). So if you interested in this
> information for debugging purpose then show this information
> correctly by lines like:
>> 2	uleshort	x	basic header size %#4.4x (2.s)	ubequad		x	NEXT
>> FRAGMENT CONTENT %#16.16llx
> 
> The archiver version number (like: 3 4 6 11 102) is stored in archive
> and that information is shown by line like:
>> 5	byte		x		\b, v%d,
> Afterwards the minimum archiver version to extract like 1 is
> stored. Similar to ZIP examples now show this information too by
> additional line like:
>> 6	ubyte		!1		minimum %u to extract,
> 
> Often the original archive name is shown. This was done by line like:
>> 34	string		x		original name: %s,
> But sometimes this is missing or obviously wrong like in example
> TEST-hk2.ARJ. If i understand documentation right than sometimes 4
> extra bytes are inserted before 0-terminated file name. So this now
> becomes like:
>> 34	byte		x		original name: 34	byte		<0x0B
>>> 38	string		x		%s,
>> 34	byte		>0x0A
>>> 34	string		x		%s,
> 
> At offset of file name sometimes the arj protection factor is
> stored. The maximal value is 10, where this value is given by arj
> command switch like hky, where y is a digit and factory is
> calculated by adding one to y value. The existence of data
> protection record is shown by setted ARJPROT_FLAG bit in flags
> byte. So show now this information by lines like:
>> 8	byte		&0x08		recoverable
>>> 0x22	byte		x		(factor %u),
> 
> Normally 3-byte suffix like ".arj" or the upcased variant on DOS
> systems is used. For multi volume first name is archive.arj then
> following parts are like archive.a01, archive.a02 and so on. In the
> following parts the "multi-volume" flag is set. For self extracting
> multi volume archives first name is archive.exe. This is correctly
> identified like executable for MS Windows, with additional tag "ARJ
> self-extracting archive". The following parts are normal archives
> with names like archive.e01, archive.e02 and so on. Astonish here
> flag for multi-volume is not set. So the extensions are now shown
> by additional lines like:
> 
>> 0x26	search/1024	\0
> #>>&-5	string		x		extension %.4s
>>> &-5	string/c	.arj		data
> !:ext	arj
>>> &-5	default		x
>>>> 8	byte		&0x04		data
> !:ext	a01/a02
>>>> 8	byte		^0x04		data, SFX multi-volume
> !:ext	e01/e02
> 
> So i also saw that only few bits in flag byte are shown and
> interpreted. So according to documentation i add more flags values.
> So for example TEST-gstew.ARJ show GARBLED_FLAG1. If this bit is
> set then the archive content is garbled with password given by g
> switch. So show this information with additional encryption version
> by lines like:
>> 8	byte		&0x01		garbled
>>> 0x20	ubyte		x		(v%u),
> 
> At offset 0xC date+time for creation and modified stamps are
> stored. Similar to ZIP archives that information is stored  in
> MS-DOS format.
> So show this by sub routine dos-date inside newest Magdir/msdos or
> use new internal functions lemsdosdate and lemsdostime. This is now
> done by lines like:
>> 0xC	ulelong		x		created 0xC	use		dos-date
> #>0xE	lemsdosdate	x		%s
> #>0xC	lemsdostime	x		%s
>> 0xC	ulelong		x		\b, 0x10	ulelong		>0		modified
>>> 0x10	use		dos-date
> #>>0x12	lemsdosdate	x		%s
> #>>0x10	lemsdostime	x		%s
>>> 0x10	ulelong		x		\b,
> That information can be verified by commands like:
> 	arj	l 	pmext4pc.arj
> 	7z	l -tarj	PHRACK1.ARJ
> 
> The detection happens by start magic lines like:
> 0	leshort		0xea60		ARJ archive data
> !:mime	application/x-arj
> That used only 2 bytes. That is not a strong magic and this is in
> contrast to recommendation to use at least 4 bytes. The DROID test
> example fmt-610-signature-id-946.arj just contains these 2 first
> bytes. So by current magic this is also described as ARJ archive
> data. This is not what you really want. So i skip this example by
> additional test for valid file type (2) of main header. Also put
> displaying part inside sub routine arj-archive. That starting lines
> now becomes like:
> 0	leshort		0xea60
>> 0xA	ubyte		2
>>> 0	use		arj-archive
> 0	name		arj-archive
>> 0	leshort		x		ARJ archive
> !:mime	application/x-arj
> At first glance this looks like an overkill, but this has some
> advantages. According to comment lines "[JW] idarc" there exist
> samples where magic occurs 2 bytes later. This was expressed by
> line
> 2	leshort		0xea60		ARJ archive data
> Unfortunately i myself have no such example, but i prepared lines
> to use here also the sub routine. So this now becomes like:
> 2	leshort		0xea60		ARJ archive data
> #2	leshort		0xea60
> #>2	use		arj-archive
> Also the SFX archive has after executing stub the real ARJ archive.
> So it is possible to jump to right position and show information by
> calling subroutine at that offset.
> 
> Some fields described in documentation like archive size, filespec
> position are not understandable for me or i get not expected values.
> So i added these fields only as comment lines like:
> # archive size (currently used only for secured archives); MAYBE?
> #>0x14	ulelong		!0		file size %u,
> # security envelope file position; MAYBE?
> #>0x18	ulelong		!0		at %#x security envelope,
> # filespec position in filename; WHAT IS THAT?
> #>0x1C	uleshort	>0		filespec position %#x,
> 
> After applying the above mentioned modifications by patch
> file-5.41-archive-jar_j.diff and using newest Magdir/msdos
> then all samples are described as before with corrections and more
> details like:
> 
> 19GXE.ARJ:         ARJ archive data, v3
> 		   , created 19 may 1980+13
> 		   , original name: #9GXE.ARJ
> 		   , os: MS-DOS
> MY_JARC.JAR:       JAR (ARJ Software, Inc.) archive data
> SAMPLE.J:          JAR (ARJ Software, Inc.) archive data
> 		   , CRC32 0x92c93391
> TEST-hk2.ARJ:      ARJ archive data, v11
> 		   , recoverable (factor 3)
> 		   , slash-switched
> 		   , created 10 mar 1980+42
> 		   , original name: TEST-hk2.ARJ
> 		   , os: WIN32
> WP60.ARJ:          ARJ archive data, v4
> 		   , ANSI codepage
> 		   , slash-switched
> 		   , created 2 jun 1980+13
> 		   , security envelope length 0x471
> 		   , original name: WP60.ARJ
> 		   , os: MS-DOS
> pmext4pc.arj:      ARJ archive data, v6
> 		   , slash-switched
> 		   , created 13 mar 1980+15
> 		   , original name: PMEXT4PC.ARJ
> 		   , os: MS-DOS
> test-je-v360K.e01: ARJ archive data, SFX multi-volume, v11
> 		   , slash-switched
> 		   , created 12 mar 1980+42
> 		   , original name: test-je-v360K.e01
> 		   , os: WIN32
> test-r-v360.a02:   ARJ archive data, v11
> 		   , multi-volume
> 		   , slash-switched
> 		   , created 12 mar 1980+42
> 		   , original name: test-r-v360.a02
> 		   , os: WIN32
> zip300.j:          JAR (ARJ Software, Inc.) archive data
> 		   , CRC32 0x37c4d93d
> zip300.j01:        JAR (ARJ Software, Inc.) archive data
> 		   , CRC32 0xedaf841b
> 
> 
> With --extension option now i get expected output like:
> 
> 19GXE.ARJ:         arj
> MY_JARC.JAR:       jar
> SAMPLE.J:          j/j01/j02
> TEST-hk2.ARJ:      arj
> WP60.ARJ:          arj
> pmext4pc.arj:      arj
> test-je-v360K.e01: e01/e02
> test-r-v360.a02:   a01/a02
> zip300.j:          j/j01/j02
> zip300.j01:        j/j01/j02
> 
> I hope my diff file can be applied in future version of file
> utility.
> 
> By -i option the mime type is shown which is given by magic line
> looking like "!:mime	application/x-arj". So it would be nice to
> implement in similar way option to show TrID, shared-mime-info,
> DROID description and/or identification number PUID. Why? It
> remembers me like the anti-virus software. Every company calls it
> differently. So if you are iun trouble and are uncertain because
> you get different descriptions than it is difficult to decide what
> is correct or is the same fact just called with other description tex
> t.
> 
> With best wishes
> Jörg Jenderek
> - --
> Jörg Jenderek
> 
> 
> 
> 
> 
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
> 
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYizpKQAKCRCv8rHJQhrU
> 1gotAKCZbwWfj9HC+ZlUzqPbpVOmOf8BXgCfdcbThoGbfGELNhE84O2BfDJ8I/s=
> =dU/G
> -----END PGP SIGNATURE-----
> <Nachrichtenteil als Anhang.DEFANGED-429><jar_j_trid-v.txt.gz><file-5_41-archive-jar_j_diff.DEFANGED-430><file-5_41-archive-jar_j_diff_sig.DEFANGED-431><jar_j-droid.csv.gz>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20220312/4b31af84/attachment.asc>


More information about the File mailing list