[File] [PATCH] Magdir/archive for Comic Book Archive, tar archive *.CBT

Christos Zoulas christos at zoulas.com
Sat Jul 30 17:02:45 UTC 2022


Committed, thanks!

christos

> On Jul 27, 2022, at 7:04 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hello,
> 
> some days ago i send patches for DOS COM executables. One Syslinux
> COMboot variant use file name extension CBT instead of COM.
> 
> For control reason i look for other files with CBT extension on my
> systems. According to DROID utility this extension is also used for
> Comic Book Archive. When running file command (version 5.42) on
> such examples and related tar archives with -e tar option i get an
> output like:
> 
> Black_Cobra_003.cbt: POSIX tar archive (GNU), file
> 		     19.jpg, mode 000644 ,
> 		     size 00003315356,
> 		     seconds 11540725637
> M129-pax.tar:        POSIX.1-2001 tar archive, global
> 		     /tmp/GlobalHead.2512.2, mode 0000644,
> 		     uid 0000000, gid 0000000,
> 		     size 00000000141,
> 		     seconds 13071714760
> TAR3214-j.TAR:       tar archive (old), file
> 		     tar3214.txt, mode    666 ,
> 		     uid      0 , gid      0 ,
> 		     size        3400 ,
> 		     seconds  6450504352, comment:
> 		     comment field created by -j option by DO
> archive.dir.tar:     POSIX tar archive (GNU), directory
> 		     gettext-0.10.35/, mode 0000755,
> 		     uid 0000000, gid 0000000,
> 		     size 00000000000,
> 		     seconds 11401732537,
> 		     user root, group root
> comics.cbt:          POSIX tar archive (GNU), file
> 		     test.jpg, mode 0000644,
> 		     uid 0001750, gid 0001750,
> 		     size 00000001121,
> 		     seconds 10665023160,
> 		     user jjmarin, group jjmarin
> dpmi-en.tar:         POSIX tar archive, file
> 		     0.9.gif, mode 000644 ,
> 		     uid 000124 , gid 000024 ,
> 		     size 00000000147 ,
> 		     seconds 05762024207,
> 		     user dj, group user
> gtarfail.tar:        POSIX tar archive, file
> 		     vedpowered.gif, mode 0000644,
> 		     uid 0000746, gid 0002044,
> 		     size 00000001006 ,
> 		     seconds 07303467402,
> 		     user jes, group glone
> id-high2037-old.tar: tar archive (V7), file
> 		     6Mar2037.txt, mode 0000644,
> 		     uid 7777777, gid 7777777,
> 		     size 00000000374,
> 		     seconds 17626765756
> test-png.cbt:        POSIX tar archive (GNU), file
> 		     0001.png, mode 000644 ,
> 		     size 00000002567,
> 		     seconds 13273174121
> test4digit.tar:      POSIX tar archive (GNU), file
> 		     2712.txt, mode 000644 ,
> 		     size 00000204500,
> 		     seconds 13220741303
> test_data.tar:       POSIX tar archive (GNU), file
> 		     0000000000000000.empty.br,
> 		     mode 0000600,
> 		     uid 0423055, gid 0257523,
> 		     size 00000000001,
> 		     seconds 13266421766,
> 		     user eustas, group primarygroup
> win10iso-gnu.tar:    POSIX tar archive (GNU), file
> 		     m/vm/14393.0.160715-1616.
> 		     RS1_RELEASE_CLIENTENTERPRISE_S_EVAL,
> 		     mode 0000644,
> 		     uid 0002464, gid 0001143,
> 		     size 0xd72db800,
> 		     seconds 13031057704,
> 		     user joerg, group Administratoren
> 
> When running file command without such option i get an output like:
> 
> Black_Cobra_003.cbt: POSIX tar archive (GNU)
> M129-pax.tar:        POSIX tar archive
> TAR3214-j.TAR:       tar archive
> archive.dir.tar:     POSIX tar archive (GNU)
> comics.cbt:          POSIX tar archive (GNU)
> dpmi-en.tar:         POSIX tar archive
> gtarfail.tar:        POSIX tar archive
> id-high2037-old.tar: tar archive
> test-png.cbt:        POSIX tar archive (GNU)
> test4digit.tar:      POSIX tar archive (GNU)
> test_data.tar:       POSIX tar archive (GNU)
> win10iso-gnu.tar:    data
> 
> With option to show file name extension i get wrong phrase like
> tar/gtar or ??? and with option to show mime type i get wrong phrase
> like application/x-tar or application/x-gtar.
> 
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/). This
> identifies most CBT examples as "Comic Book Archive" by PUID
> fmt/1462 based on file name extension (See appended
> droid-comicbook-cbt.csv.gz)
> 
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). Many examples are
> described with low priority as "Tape ARchive (file)" by definition
> ark-tar-file.trid.xml. With higher priority many examples are also
> described as "TAR - Tape ARchive (GNU)" by ark-tar-gnu.trid.xml (See
> appended trid-v-comicbook-cbt.txt.gz).
> 
> There exist a page about Comic book archive on Wikipedia and on
> file formats archive team website. That is now expressed by
> comment lines like:
> # URL:		https://en.wikipedia.org/wiki/Comic_book_archive
> #		http://fileformats.archiveteam.org/
> #		wiki/Comic_Book_Archive
> 
> Luckily inside Magdir/archive the displaying part for tar archive is
> done by calling sub routine tar-file.
> 
> So i create inside Magdir/archive lines for such Comic Book Archive
> TAR variant by sub routine tar-cbt which looks like:
> 0	name		tar-cbt
>> 0	string		x	Comic Book Archive, tar archive
> !:mime	application/vnd.comicbook
> !:ext	cbt
>> 0	string		>\0	\b, 1st image %-.60s
> 
> Instead of generic mime type like application/x-tar i display
> another. For other variants a type starting with text like
> application/vnd.comicbook is used. For CBZ variant additional +zip is
> used and for CBR variant additional -rar is used. For TAR variant i
> found nothing, but when thinking logical this should look at least
> like application/vnd.comicbook. For the TAR packed variant the
> extension CBT instead of TAR is used.
> 
> Unfortunately there exist for precisely specification. It is
> described that every page of the comic is stored as image, where
> only a few types are used (mainly like JPEG or PNG are used, but
> also TIFF, GIF and BMP can occur). And such filenames are used that
> these represent the sort order of the page numbers. This
> information is shown by last line of sub routine and should look
> like 19.jpg, 0001.png or 0002.png.
> 
> So as additional test i look for such image names by check for 1st
> image main name with digits and for image name extension by regular
> expression. If this true it probably is a Comic Book Archive. So i
> call here the new sub routine. If it is false then it is probably a
> "normal" tar archive and call old sub routine. So this is done by
> additional lines which looks like:
>>>>>>>>> 0 regex \^[0-9]{2,4}[.](png|jpg|jpeg|tif|tiff|gif|bmp)
>>>>>>>>>> 0	use	tar-cbt
>>>>>>>>> 0	default		x
>>>>>>>>>> 0	use	tar-file
> 
> I do not know if this always true, because it is written that folders
> may be used to group images in a more logical layout within the
> archive, like book chapters. Or some applications support additional
> tag information in the form of embedded XML files in the archive like
> ComicInfo.xml. So maybe more test lines or branches or more
> sophisticated regular expressions must be used for exotic samples.
> 
> After applying the above mentioned modifications by patch
> file-5.42-archive-cbt.diff then most Comic Book CBT Archive samples
> are now identified correctly and related TAR files are still
> described as before. This now looks like:
> 
> Black_Cobra_003.cbt: Comic Book Archive,
> 		     tar archive, 1st image
> 		     19.jpg
> M129-pax.tar:        POSIX.1-2001 tar archive, global
> 		     /tmp/GlobalHead.2512.2, mode 0000644,
> 		     uid 0000000, gid 0000000,
> 		     size 00000000141,
> 		     seconds 13071714760
> TAR3214-j.TAR:       tar archive (old), file
> 		     tar3214.txt, mode    666 ,
> 		     uid      0 , gid      0 ,
> 		     size        3400 ,
> 		     seconds  6450504352, comment:
> 		     comment field created by -j option by DO
> archive.dir.tar:     POSIX tar archive (GNU), directory
> 		     gettext-0.10.35/, mode 0000755,
> 		     uid 0000000, gid 0000000,
> 		     size 00000000000,
> 		     seconds 11401732537,
> 		     user root, group root
> comics.cbt:          POSIX tar archive (GNU), file
> 		     test.jpg, mode 0000644,
> 		     uid 0001750, gid 0001750,
> 		     size 00000001121,
> 		     seconds 10665023160,
> 		     user jjmarin, group jjmarin
> dpmi-en.tar:         POSIX tar archive, file
> 		     0.9.gif, mode 000644 ,
> 		     uid 000124 , gid 000024 ,
> 		     size 00000000147 ,
> 		     seconds 05762024207,
> 		     user dj, group user
> gtarfail.tar:        POSIX tar archive, file
> 		     vedpowered.gif, mode 0000644,
> 		     uid 0000746, gid 0002044,
> 		     size 00000001006 ,
> 		     seconds 07303467402,
> 		     user jes, group glone
> id-high2037-old.tar: tar archive (V7), file
> 		     6Mar2037.txt, mode 0000644,
> 		     uid 7777777, gid 7777777,
> 		     size 00000000374,
> 		     seconds 17626765756
> test-png.cbt:        Comic Book Archive,
> 		     tar archive, 1st image
> 		     0001.png
> test4digit.tar:      POSIX tar archive (GNU), file
> 		     2712.txt, mode 000644 ,
> 		     size 00000204500,
> 		     seconds 13220741303
> test_data.tar:       POSIX tar archive (GNU), file
> 		     0000000000000000.empty.br, mode 0000600,
> 		     uid 0423055, gid 0257523,
> 		     size 00000000001,
> 		     seconds 13266421766,
> 		     user eustas, group primarygroup
> win10iso-gnu.tar:    POSIX tar archive (GNU), file
> 		     m/vm/14393.0.160715-1616.
> 		     RS1_RELEASE_CLIENTENTERPRISE_S_EVAL,
> 		     mode 0000644,
> 		     uid 0002464, gid 0001143,
> 		     size 0xd72db800,
> 		     seconds 13031057704,
> 		     user joerg, group Administratoren
> 
> I hope my diff file can be applied in future version of file
> utility.
> 
> There exist still some other file formats with CBT suffix. I will
> try to handle this in a future session.
> 
> With best wishes,
> Jörg Jenderek
> - --
> Jörg Jenderek
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
> 
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYuHEYAAKCRCv8rHJQhrU
> 1kwSAKDfwwjm/RhQycZJXwBPbV9XGPWwGQCgx5Ld5nthG93biSG3g/PygDy8p8Q=
> =VhQO
> -----END PGP SIGNATURE-----
> <trid-v-comicbook-cbt.txt.gz><droid-comicbook-cbt.csv.gz><file-5_42-archive-comicbook-cbt_diff.DEFANGED-99600><file-5_42-archive-comicbook-cbt_diff_sig.DEFANGED-99601>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20220730/26cbba14/attachment.asc>


More information about the File mailing list