[File] [PATCH] Magdir/archive for ARJ, JAR (ARJ Software, Inc.) versus Java archive data (JAR)
Christos Zoulas
christos at zoulas.com
Sat Mar 12 19:01:41 UTC 2022
Sorry, that does not seem to apply cleanly. Can you diff agains HEAD?
christos
> On Mar 12, 2022, at 1:40 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello,
>
> some days ago i want to handle some Java archive which are ZIP
> compressed based and have normally 3 byte jar name extension or
> maybe 1 byte j extension.
>
> Unfortunately these extensions are also used by other compression
> tools. When running file command version 5.41 on such non-ZIP
> examples extensions i get a nearly correct output like:
> 19GXE.ARJ: ARJ archive data, v3
> , original name: #9GXE.ARJ
> , os: MS-DOS
> MY_JARC.JAR: JAR (ARJ Software, Inc.) archive data
> SAMPLE.J: JAR (ARJ Software, Inc.) archive data
> TEST-hk2.ARJ: ARJ archive data, v11
> , slash-switched
> , original name: \003,
> WP60.ARJ: ARJ archive data, v4
> , slash-switched
> , original name: WP60.ARJ
> , os: MS-DOS 4]
> pmext4pc.arj: ARJ archive data, v6
> , slash-switched
> , original name: PMEXT4PC.ARJ
> , os: MS-DOS 3]
> test-je-v360K.e01: ARJ archive data, v11
> , slash-switched
> , original name: ,
> test-r-v360.a02: ARJ archive data, v11
> , multi-volume
> , slash-switched
> , original name: ,
> zip300.j: JAR (ARJ Software, Inc.) archive data
> zip300.j01: JAR (ARJ Software, Inc.) archive data
>
>
> With --extension option only ??? is displayed. Furthermore with -i
> option only for ARJ samples application/x-arj is shown. For other
> examples only generic application/octet-stream is shown.
>
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). This describes
> samples with JAR extension like MY_JARC.JAR as "JARCS compressed
> archive" by ark-jarcs.trid.xml. Most of the others are described as
> "JAR compressed archive" by ark-jar.trid.xml or as variant with
> additional "(with Security Envelope)" by ark-jar-se.trid.xml (See
> appended jar_j_trid-v.txt.gz)
>
> Luckily with -v option TrID displays file name extension and related
> URL. With the information of this tools i found a page about about
> JAR (ARJ Software) on file formats archive team web site. That
> information is expressed by comment lines inside Magdir/archive like:
> # URL: http://fileformats.archiveteam.org/wiki/JAR_(ARJ_Software)
> # ref.: http://mark0.net/download/triddefs_xml.7z
> # defs/a/ark-jar.trid.xml
>
> The description happens inside Magdir/archive line like:
> 0xe string \x1aJar\x1b JAR (ARJ Software, Inc.) archive data
> This now becomes like:
> 0xe string \x1aJar\x1b JAR (ARJ Software, Inc.) archive data
> !:mime application/x-compress-j
>> 0 ulelong x \b, CRC32 %#x
> !:ext j/j01/j02
> Instead generic mime type application/octet-stream i display an used
> defined one. The standard suffix is ".j", but if you create multi
> volumes then the first get this suffix but the following have suffix
> with number order like:
> j01 j02 ... j99 100 ... 990
>
> For the example with only one j character in suffix the description
> happens inside Magdir/archive line like:
> 0 string JARCS JAR (ARJ Software, Inc.) archive data
> So we get same description as in the other examples, but there at
> the beginning the CRC is stored whereas here we find text string
> JARCS. So magic lines now becomes like:
> 0 string JARCS JAR (ARJ Software, Inc.) archive data
> !:mime application/x-compress-jar
> !:ext jar
> Instead generic mime type application/octet-stream i display another
> used defined one. The standard suffix is ".jar", that is also used
> for Java archive. The information about that format are expressed
> inside Magdir/archive by lines like:
> # URL: http://fileformats.archiveteam.org/wiki/JARCS
> # reference: http://mark0.net/download/triddefs_xml.7z
> # a/ark-jarcs.trid.xml
>
> Because the "jar" are described as based on ARJ i also check such
> examples. These are described by TrID as "ARJ compressed archive" by
> ark-arj.trid.xml and as "ARJ File Format" by DROID via PUID fmt/610.
> There is also mentioned that the specification about ARJ can be found
> in file with name TECHNOTE.TXT, that can be found in unarj or
> multiarc sources tree. That information is expressed by comment
> lines inside Magdir/archive like:
> # URL: http://fileformats.archiveteam.org/wiki/ARJ
> # reference: http://mark0.net/download/triddefs_xml.7z
> # defs/a/ark-arj.trid.xml
> # https://github.com/FarGroup/FarManager/
> # blob/master/plugins/multiarc/arc.doc/arj.txt
>
> Often information about the operating system is shown. That is done
> by lines like:
>> 7 byte 0 os: MS-DOS 7 byte 1 os: PRIMOS
> ...
>> 7 byte 9 os: VAX/VMS
> But sometimes no such information is shown, because for "new"
> systems higher numbers are used. So according to newer
> specification this is now done by additional lines like:
>> 7 byte 10 os: WIN95 7 byte 11 os: WIN32
>
> Afterwards often a digit and bracket like for example WP60.ARJ is
> shown. That was done by line like:
>> 3 byte >0 %d]
>
> But according to specification this is the basic header size (like:
> 0x002b 0x002c 0x04e0 0x04e3 0x04e7). So if you interested in this
> information for debugging purpose then show this information
> correctly by lines like:
>> 2 uleshort x basic header size %#4.4x (2.s) ubequad x NEXT
>> FRAGMENT CONTENT %#16.16llx
>
> The archiver version number (like: 3 4 6 11 102) is stored in archive
> and that information is shown by line like:
>> 5 byte x \b, v%d,
> Afterwards the minimum archiver version to extract like 1 is
> stored. Similar to ZIP examples now show this information too by
> additional line like:
>> 6 ubyte !1 minimum %u to extract,
>
> Often the original archive name is shown. This was done by line like:
>> 34 string x original name: %s,
> But sometimes this is missing or obviously wrong like in example
> TEST-hk2.ARJ. If i understand documentation right than sometimes 4
> extra bytes are inserted before 0-terminated file name. So this now
> becomes like:
>> 34 byte x original name: 34 byte <0x0B
>>> 38 string x %s,
>> 34 byte >0x0A
>>> 34 string x %s,
>
> At offset of file name sometimes the arj protection factor is
> stored. The maximal value is 10, where this value is given by arj
> command switch like hky, where y is a digit and factory is
> calculated by adding one to y value. The existence of data
> protection record is shown by setted ARJPROT_FLAG bit in flags
> byte. So show now this information by lines like:
>> 8 byte &0x08 recoverable
>>> 0x22 byte x (factor %u),
>
> Normally 3-byte suffix like ".arj" or the upcased variant on DOS
> systems is used. For multi volume first name is archive.arj then
> following parts are like archive.a01, archive.a02 and so on. In the
> following parts the "multi-volume" flag is set. For self extracting
> multi volume archives first name is archive.exe. This is correctly
> identified like executable for MS Windows, with additional tag "ARJ
> self-extracting archive". The following parts are normal archives
> with names like archive.e01, archive.e02 and so on. Astonish here
> flag for multi-volume is not set. So the extensions are now shown
> by additional lines like:
>
>> 0x26 search/1024 \0
> #>>&-5 string x extension %.4s
>>> &-5 string/c .arj data
> !:ext arj
>>> &-5 default x
>>>> 8 byte &0x04 data
> !:ext a01/a02
>>>> 8 byte ^0x04 data, SFX multi-volume
> !:ext e01/e02
>
> So i also saw that only few bits in flag byte are shown and
> interpreted. So according to documentation i add more flags values.
> So for example TEST-gstew.ARJ show GARBLED_FLAG1. If this bit is
> set then the archive content is garbled with password given by g
> switch. So show this information with additional encryption version
> by lines like:
>> 8 byte &0x01 garbled
>>> 0x20 ubyte x (v%u),
>
> At offset 0xC date+time for creation and modified stamps are
> stored. Similar to ZIP archives that information is stored in
> MS-DOS format.
> So show this by sub routine dos-date inside newest Magdir/msdos or
> use new internal functions lemsdosdate and lemsdostime. This is now
> done by lines like:
>> 0xC ulelong x created 0xC use dos-date
> #>0xE lemsdosdate x %s
> #>0xC lemsdostime x %s
>> 0xC ulelong x \b, 0x10 ulelong >0 modified
>>> 0x10 use dos-date
> #>>0x12 lemsdosdate x %s
> #>>0x10 lemsdostime x %s
>>> 0x10 ulelong x \b,
> That information can be verified by commands like:
> arj l pmext4pc.arj
> 7z l -tarj PHRACK1.ARJ
>
> The detection happens by start magic lines like:
> 0 leshort 0xea60 ARJ archive data
> !:mime application/x-arj
> That used only 2 bytes. That is not a strong magic and this is in
> contrast to recommendation to use at least 4 bytes. The DROID test
> example fmt-610-signature-id-946.arj just contains these 2 first
> bytes. So by current magic this is also described as ARJ archive
> data. This is not what you really want. So i skip this example by
> additional test for valid file type (2) of main header. Also put
> displaying part inside sub routine arj-archive. That starting lines
> now becomes like:
> 0 leshort 0xea60
>> 0xA ubyte 2
>>> 0 use arj-archive
> 0 name arj-archive
>> 0 leshort x ARJ archive
> !:mime application/x-arj
> At first glance this looks like an overkill, but this has some
> advantages. According to comment lines "[JW] idarc" there exist
> samples where magic occurs 2 bytes later. This was expressed by
> line
> 2 leshort 0xea60 ARJ archive data
> Unfortunately i myself have no such example, but i prepared lines
> to use here also the sub routine. So this now becomes like:
> 2 leshort 0xea60 ARJ archive data
> #2 leshort 0xea60
> #>2 use arj-archive
> Also the SFX archive has after executing stub the real ARJ archive.
> So it is possible to jump to right position and show information by
> calling subroutine at that offset.
>
> Some fields described in documentation like archive size, filespec
> position are not understandable for me or i get not expected values.
> So i added these fields only as comment lines like:
> # archive size (currently used only for secured archives); MAYBE?
> #>0x14 ulelong !0 file size %u,
> # security envelope file position; MAYBE?
> #>0x18 ulelong !0 at %#x security envelope,
> # filespec position in filename; WHAT IS THAT?
> #>0x1C uleshort >0 filespec position %#x,
>
> After applying the above mentioned modifications by patch
> file-5.41-archive-jar_j.diff and using newest Magdir/msdos
> then all samples are described as before with corrections and more
> details like:
>
> 19GXE.ARJ: ARJ archive data, v3
> , created 19 may 1980+13
> , original name: #9GXE.ARJ
> , os: MS-DOS
> MY_JARC.JAR: JAR (ARJ Software, Inc.) archive data
> SAMPLE.J: JAR (ARJ Software, Inc.) archive data
> , CRC32 0x92c93391
> TEST-hk2.ARJ: ARJ archive data, v11
> , recoverable (factor 3)
> , slash-switched
> , created 10 mar 1980+42
> , original name: TEST-hk2.ARJ
> , os: WIN32
> WP60.ARJ: ARJ archive data, v4
> , ANSI codepage
> , slash-switched
> , created 2 jun 1980+13
> , security envelope length 0x471
> , original name: WP60.ARJ
> , os: MS-DOS
> pmext4pc.arj: ARJ archive data, v6
> , slash-switched
> , created 13 mar 1980+15
> , original name: PMEXT4PC.ARJ
> , os: MS-DOS
> test-je-v360K.e01: ARJ archive data, SFX multi-volume, v11
> , slash-switched
> , created 12 mar 1980+42
> , original name: test-je-v360K.e01
> , os: WIN32
> test-r-v360.a02: ARJ archive data, v11
> , multi-volume
> , slash-switched
> , created 12 mar 1980+42
> , original name: test-r-v360.a02
> , os: WIN32
> zip300.j: JAR (ARJ Software, Inc.) archive data
> , CRC32 0x37c4d93d
> zip300.j01: JAR (ARJ Software, Inc.) archive data
> , CRC32 0xedaf841b
>
>
> With --extension option now i get expected output like:
>
> 19GXE.ARJ: arj
> MY_JARC.JAR: jar
> SAMPLE.J: j/j01/j02
> TEST-hk2.ARJ: arj
> WP60.ARJ: arj
> pmext4pc.arj: arj
> test-je-v360K.e01: e01/e02
> test-r-v360.a02: a01/a02
> zip300.j: j/j01/j02
> zip300.j01: j/j01/j02
>
> I hope my diff file can be applied in future version of file
> utility.
>
> By -i option the mime type is shown which is given by magic line
> looking like "!:mime application/x-arj". So it would be nice to
> implement in similar way option to show TrID, shared-mime-info,
> DROID description and/or identification number PUID. Why? It
> remembers me like the anti-virus software. Every company calls it
> differently. So if you are iun trouble and are uncertain because
> you get different descriptions than it is difficult to decide what
> is correct or is the same fact just called with other description tex
> t.
>
> With best wishes
> Jörg Jenderek
> - --
> Jörg Jenderek
>
>
>
>
>
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYizpKQAKCRCv8rHJQhrU
> 1gotAKCZbwWfj9HC+ZlUzqPbpVOmOf8BXgCfdcbThoGbfGELNhE84O2BfDJ8I/s=
> =dU/G
> -----END PGP SIGNATURE-----
> <Nachrichtenteil als Anhang.DEFANGED-429><jar_j_trid-v.txt.gz><file-5_41-archive-jar_j_diff.DEFANGED-430><file-5_41-archive-jar_j_diff_sig.DEFANGED-431><jar_j-droid.csv.gz>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20220312/4b31af84/attachment.asc>
More information about the File
mailing list