[File] [PATCH] Magdir/archive for Comic Book Archive, tar archive *.CBT
Christos Zoulas
christos at zoulas.com
Sat Jul 30 17:02:45 UTC 2022
Committed, thanks!
christos
> On Jul 27, 2022, at 7:04 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello,
>
> some days ago i send patches for DOS COM executables. One Syslinux
> COMboot variant use file name extension CBT instead of COM.
>
> For control reason i look for other files with CBT extension on my
> systems. According to DROID utility this extension is also used for
> Comic Book Archive. When running file command (version 5.42) on
> such examples and related tar archives with -e tar option i get an
> output like:
>
> Black_Cobra_003.cbt: POSIX tar archive (GNU), file
> 19.jpg, mode 000644 ,
> size 00003315356,
> seconds 11540725637
> M129-pax.tar: POSIX.1-2001 tar archive, global
> /tmp/GlobalHead.2512.2, mode 0000644,
> uid 0000000, gid 0000000,
> size 00000000141,
> seconds 13071714760
> TAR3214-j.TAR: tar archive (old), file
> tar3214.txt, mode 666 ,
> uid 0 , gid 0 ,
> size 3400 ,
> seconds 6450504352, comment:
> comment field created by -j option by DO
> archive.dir.tar: POSIX tar archive (GNU), directory
> gettext-0.10.35/, mode 0000755,
> uid 0000000, gid 0000000,
> size 00000000000,
> seconds 11401732537,
> user root, group root
> comics.cbt: POSIX tar archive (GNU), file
> test.jpg, mode 0000644,
> uid 0001750, gid 0001750,
> size 00000001121,
> seconds 10665023160,
> user jjmarin, group jjmarin
> dpmi-en.tar: POSIX tar archive, file
> 0.9.gif, mode 000644 ,
> uid 000124 , gid 000024 ,
> size 00000000147 ,
> seconds 05762024207,
> user dj, group user
> gtarfail.tar: POSIX tar archive, file
> vedpowered.gif, mode 0000644,
> uid 0000746, gid 0002044,
> size 00000001006 ,
> seconds 07303467402,
> user jes, group glone
> id-high2037-old.tar: tar archive (V7), file
> 6Mar2037.txt, mode 0000644,
> uid 7777777, gid 7777777,
> size 00000000374,
> seconds 17626765756
> test-png.cbt: POSIX tar archive (GNU), file
> 0001.png, mode 000644 ,
> size 00000002567,
> seconds 13273174121
> test4digit.tar: POSIX tar archive (GNU), file
> 2712.txt, mode 000644 ,
> size 00000204500,
> seconds 13220741303
> test_data.tar: POSIX tar archive (GNU), file
> 0000000000000000.empty.br,
> mode 0000600,
> uid 0423055, gid 0257523,
> size 00000000001,
> seconds 13266421766,
> user eustas, group primarygroup
> win10iso-gnu.tar: POSIX tar archive (GNU), file
> m/vm/14393.0.160715-1616.
> RS1_RELEASE_CLIENTENTERPRISE_S_EVAL,
> mode 0000644,
> uid 0002464, gid 0001143,
> size 0xd72db800,
> seconds 13031057704,
> user joerg, group Administratoren
>
> When running file command without such option i get an output like:
>
> Black_Cobra_003.cbt: POSIX tar archive (GNU)
> M129-pax.tar: POSIX tar archive
> TAR3214-j.TAR: tar archive
> archive.dir.tar: POSIX tar archive (GNU)
> comics.cbt: POSIX tar archive (GNU)
> dpmi-en.tar: POSIX tar archive
> gtarfail.tar: POSIX tar archive
> id-high2037-old.tar: tar archive
> test-png.cbt: POSIX tar archive (GNU)
> test4digit.tar: POSIX tar archive (GNU)
> test_data.tar: POSIX tar archive (GNU)
> win10iso-gnu.tar: data
>
> With option to show file name extension i get wrong phrase like
> tar/gtar or ??? and with option to show mime type i get wrong phrase
> like application/x-tar or application/x-gtar.
>
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/). This
> identifies most CBT examples as "Comic Book Archive" by PUID
> fmt/1462 based on file name extension (See appended
> droid-comicbook-cbt.csv.gz)
>
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). Many examples are
> described with low priority as "Tape ARchive (file)" by definition
> ark-tar-file.trid.xml. With higher priority many examples are also
> described as "TAR - Tape ARchive (GNU)" by ark-tar-gnu.trid.xml (See
> appended trid-v-comicbook-cbt.txt.gz).
>
> There exist a page about Comic book archive on Wikipedia and on
> file formats archive team website. That is now expressed by
> comment lines like:
> # URL: https://en.wikipedia.org/wiki/Comic_book_archive
> # http://fileformats.archiveteam.org/
> # wiki/Comic_Book_Archive
>
> Luckily inside Magdir/archive the displaying part for tar archive is
> done by calling sub routine tar-file.
>
> So i create inside Magdir/archive lines for such Comic Book Archive
> TAR variant by sub routine tar-cbt which looks like:
> 0 name tar-cbt
>> 0 string x Comic Book Archive, tar archive
> !:mime application/vnd.comicbook
> !:ext cbt
>> 0 string >\0 \b, 1st image %-.60s
>
> Instead of generic mime type like application/x-tar i display
> another. For other variants a type starting with text like
> application/vnd.comicbook is used. For CBZ variant additional +zip is
> used and for CBR variant additional -rar is used. For TAR variant i
> found nothing, but when thinking logical this should look at least
> like application/vnd.comicbook. For the TAR packed variant the
> extension CBT instead of TAR is used.
>
> Unfortunately there exist for precisely specification. It is
> described that every page of the comic is stored as image, where
> only a few types are used (mainly like JPEG or PNG are used, but
> also TIFF, GIF and BMP can occur). And such filenames are used that
> these represent the sort order of the page numbers. This
> information is shown by last line of sub routine and should look
> like 19.jpg, 0001.png or 0002.png.
>
> So as additional test i look for such image names by check for 1st
> image main name with digits and for image name extension by regular
> expression. If this true it probably is a Comic Book Archive. So i
> call here the new sub routine. If it is false then it is probably a
> "normal" tar archive and call old sub routine. So this is done by
> additional lines which looks like:
>>>>>>>>> 0 regex \^[0-9]{2,4}[.](png|jpg|jpeg|tif|tiff|gif|bmp)
>>>>>>>>>> 0 use tar-cbt
>>>>>>>>> 0 default x
>>>>>>>>>> 0 use tar-file
>
> I do not know if this always true, because it is written that folders
> may be used to group images in a more logical layout within the
> archive, like book chapters. Or some applications support additional
> tag information in the form of embedded XML files in the archive like
> ComicInfo.xml. So maybe more test lines or branches or more
> sophisticated regular expressions must be used for exotic samples.
>
> After applying the above mentioned modifications by patch
> file-5.42-archive-cbt.diff then most Comic Book CBT Archive samples
> are now identified correctly and related TAR files are still
> described as before. This now looks like:
>
> Black_Cobra_003.cbt: Comic Book Archive,
> tar archive, 1st image
> 19.jpg
> M129-pax.tar: POSIX.1-2001 tar archive, global
> /tmp/GlobalHead.2512.2, mode 0000644,
> uid 0000000, gid 0000000,
> size 00000000141,
> seconds 13071714760
> TAR3214-j.TAR: tar archive (old), file
> tar3214.txt, mode 666 ,
> uid 0 , gid 0 ,
> size 3400 ,
> seconds 6450504352, comment:
> comment field created by -j option by DO
> archive.dir.tar: POSIX tar archive (GNU), directory
> gettext-0.10.35/, mode 0000755,
> uid 0000000, gid 0000000,
> size 00000000000,
> seconds 11401732537,
> user root, group root
> comics.cbt: POSIX tar archive (GNU), file
> test.jpg, mode 0000644,
> uid 0001750, gid 0001750,
> size 00000001121,
> seconds 10665023160,
> user jjmarin, group jjmarin
> dpmi-en.tar: POSIX tar archive, file
> 0.9.gif, mode 000644 ,
> uid 000124 , gid 000024 ,
> size 00000000147 ,
> seconds 05762024207,
> user dj, group user
> gtarfail.tar: POSIX tar archive, file
> vedpowered.gif, mode 0000644,
> uid 0000746, gid 0002044,
> size 00000001006 ,
> seconds 07303467402,
> user jes, group glone
> id-high2037-old.tar: tar archive (V7), file
> 6Mar2037.txt, mode 0000644,
> uid 7777777, gid 7777777,
> size 00000000374,
> seconds 17626765756
> test-png.cbt: Comic Book Archive,
> tar archive, 1st image
> 0001.png
> test4digit.tar: POSIX tar archive (GNU), file
> 2712.txt, mode 000644 ,
> size 00000204500,
> seconds 13220741303
> test_data.tar: POSIX tar archive (GNU), file
> 0000000000000000.empty.br, mode 0000600,
> uid 0423055, gid 0257523,
> size 00000000001,
> seconds 13266421766,
> user eustas, group primarygroup
> win10iso-gnu.tar: POSIX tar archive (GNU), file
> m/vm/14393.0.160715-1616.
> RS1_RELEASE_CLIENTENTERPRISE_S_EVAL,
> mode 0000644,
> uid 0002464, gid 0001143,
> size 0xd72db800,
> seconds 13031057704,
> user joerg, group Administratoren
>
> I hope my diff file can be applied in future version of file
> utility.
>
> There exist still some other file formats with CBT suffix. I will
> try to handle this in a future session.
>
> With best wishes,
> Jörg Jenderek
> - --
> Jörg Jenderek
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYuHEYAAKCRCv8rHJQhrU
> 1kwSAKDfwwjm/RhQycZJXwBPbV9XGPWwGQCgx5Ld5nthG93biSG3g/PygDy8p8Q=
> =VhQO
> -----END PGP SIGNATURE-----
> <trid-v-comicbook-cbt.txt.gz><droid-comicbook-cbt.csv.gz><file-5_42-archive-comicbook-cbt_diff.DEFANGED-99600><file-5_42-archive-comicbook-cbt_diff_sig.DEFANGED-99601>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20220730/26cbba14/attachment.asc>
More information about the File
mailing list