[File] [PATCH] Magdir/archive for Comic Book Archive, tar archive *.CBT
Jörg Jenderek
joerg.jen.der.ek at gmx.net
Wed Jul 27 23:04:08 UTC 2022
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello,
some days ago i send patches for DOS COM executables. One Syslinux
COMboot variant use file name extension CBT instead of COM.
For control reason i look for other files with CBT extension on my
systems. According to DROID utility this extension is also used for
Comic Book Archive. When running file command (version 5.42) on
such examples and related tar archives with -e tar option i get an
output like:
Black_Cobra_003.cbt: POSIX tar archive (GNU), file
19.jpg, mode 000644 ,
size 00003315356,
seconds 11540725637
M129-pax.tar: POSIX.1-2001 tar archive, global
/tmp/GlobalHead.2512.2, mode 0000644,
uid 0000000, gid 0000000,
size 00000000141,
seconds 13071714760
TAR3214-j.TAR: tar archive (old), file
tar3214.txt, mode 666 ,
uid 0 , gid 0 ,
size 3400 ,
seconds 6450504352, comment:
comment field created by -j option by DO
archive.dir.tar: POSIX tar archive (GNU), directory
gettext-0.10.35/, mode 0000755,
uid 0000000, gid 0000000,
size 00000000000,
seconds 11401732537,
user root, group root
comics.cbt: POSIX tar archive (GNU), file
test.jpg, mode 0000644,
uid 0001750, gid 0001750,
size 00000001121,
seconds 10665023160,
user jjmarin, group jjmarin
dpmi-en.tar: POSIX tar archive, file
0.9.gif, mode 000644 ,
uid 000124 , gid 000024 ,
size 00000000147 ,
seconds 05762024207,
user dj, group user
gtarfail.tar: POSIX tar archive, file
vedpowered.gif, mode 0000644,
uid 0000746, gid 0002044,
size 00000001006 ,
seconds 07303467402,
user jes, group glone
id-high2037-old.tar: tar archive (V7), file
6Mar2037.txt, mode 0000644,
uid 7777777, gid 7777777,
size 00000000374,
seconds 17626765756
test-png.cbt: POSIX tar archive (GNU), file
0001.png, mode 000644 ,
size 00000002567,
seconds 13273174121
test4digit.tar: POSIX tar archive (GNU), file
2712.txt, mode 000644 ,
size 00000204500,
seconds 13220741303
test_data.tar: POSIX tar archive (GNU), file
0000000000000000.empty.br,
mode 0000600,
uid 0423055, gid 0257523,
size 00000000001,
seconds 13266421766,
user eustas, group primarygroup
win10iso-gnu.tar: POSIX tar archive (GNU), file
m/vm/14393.0.160715-1616.
RS1_RELEASE_CLIENTENTERPRISE_S_EVAL,
mode 0000644,
uid 0002464, gid 0001143,
size 0xd72db800,
seconds 13031057704,
user joerg, group Administratoren
When running file command without such option i get an output like:
Black_Cobra_003.cbt: POSIX tar archive (GNU)
M129-pax.tar: POSIX tar archive
TAR3214-j.TAR: tar archive
archive.dir.tar: POSIX tar archive (GNU)
comics.cbt: POSIX tar archive (GNU)
dpmi-en.tar: POSIX tar archive
gtarfail.tar: POSIX tar archive
id-high2037-old.tar: tar archive
test-png.cbt: POSIX tar archive (GNU)
test4digit.tar: POSIX tar archive (GNU)
test_data.tar: POSIX tar archive (GNU)
win10iso-gnu.tar: data
With option to show file name extension i get wrong phrase like
tar/gtar or ??? and with option to show mime type i get wrong phrase
like application/x-tar or application/x-gtar.
For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/). This
identifies most CBT examples as "Comic Book Archive" by PUID
fmt/1462 based on file name extension (See appended
droid-comicbook-cbt.csv.gz)
For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). Many examples are
described with low priority as "Tape ARchive (file)" by definition
ark-tar-file.trid.xml. With higher priority many examples are also
described as "TAR - Tape ARchive (GNU)" by ark-tar-gnu.trid.xml (See
appended trid-v-comicbook-cbt.txt.gz).
There exist a page about Comic book archive on Wikipedia and on
file formats archive team website. That is now expressed by
comment lines like:
# URL: https://en.wikipedia.org/wiki/Comic_book_archive
# http://fileformats.archiveteam.org/
# wiki/Comic_Book_Archive
Luckily inside Magdir/archive the displaying part for tar archive is
done by calling sub routine tar-file.
So i create inside Magdir/archive lines for such Comic Book Archive
TAR variant by sub routine tar-cbt which looks like:
0 name tar-cbt
>0 string x Comic Book Archive, tar archive
!:mime application/vnd.comicbook
!:ext cbt
>0 string >\0 \b, 1st image %-.60s
Instead of generic mime type like application/x-tar i display
another. For other variants a type starting with text like
application/vnd.comicbook is used. For CBZ variant additional +zip is
used and for CBR variant additional -rar is used. For TAR variant i
found nothing, but when thinking logical this should look at least
like application/vnd.comicbook. For the TAR packed variant the
extension CBT instead of TAR is used.
Unfortunately there exist for precisely specification. It is
described that every page of the comic is stored as image, where
only a few types are used (mainly like JPEG or PNG are used, but
also TIFF, GIF and BMP can occur). And such filenames are used that
these represent the sort order of the page numbers. This
information is shown by last line of sub routine and should look
like 19.jpg, 0001.png or 0002.png.
So as additional test i look for such image names by check for 1st
image main name with digits and for image name extension by regular
expression. If this true it probably is a Comic Book Archive. So i
call here the new sub routine. If it is false then it is probably a
"normal" tar archive and call old sub routine. So this is done by
additional lines which looks like:
>>>>>>>>0 regex \^[0-9]{2,4}[.](png|jpg|jpeg|tif|tiff|gif|bmp)
>>>>>>>>>0 use tar-cbt
>>>>>>>>0 default x
>>>>>>>>>0 use tar-file
I do not know if this always true, because it is written that folders
may be used to group images in a more logical layout within the
archive, like book chapters. Or some applications support additional
tag information in the form of embedded XML files in the archive like
ComicInfo.xml. So maybe more test lines or branches or more
sophisticated regular expressions must be used for exotic samples.
After applying the above mentioned modifications by patch
file-5.42-archive-cbt.diff then most Comic Book CBT Archive samples
are now identified correctly and related TAR files are still
described as before. This now looks like:
Black_Cobra_003.cbt: Comic Book Archive,
tar archive, 1st image
19.jpg
M129-pax.tar: POSIX.1-2001 tar archive, global
/tmp/GlobalHead.2512.2, mode 0000644,
uid 0000000, gid 0000000,
size 00000000141,
seconds 13071714760
TAR3214-j.TAR: tar archive (old), file
tar3214.txt, mode 666 ,
uid 0 , gid 0 ,
size 3400 ,
seconds 6450504352, comment:
comment field created by -j option by DO
archive.dir.tar: POSIX tar archive (GNU), directory
gettext-0.10.35/, mode 0000755,
uid 0000000, gid 0000000,
size 00000000000,
seconds 11401732537,
user root, group root
comics.cbt: POSIX tar archive (GNU), file
test.jpg, mode 0000644,
uid 0001750, gid 0001750,
size 00000001121,
seconds 10665023160,
user jjmarin, group jjmarin
dpmi-en.tar: POSIX tar archive, file
0.9.gif, mode 000644 ,
uid 000124 , gid 000024 ,
size 00000000147 ,
seconds 05762024207,
user dj, group user
gtarfail.tar: POSIX tar archive, file
vedpowered.gif, mode 0000644,
uid 0000746, gid 0002044,
size 00000001006 ,
seconds 07303467402,
user jes, group glone
id-high2037-old.tar: tar archive (V7), file
6Mar2037.txt, mode 0000644,
uid 7777777, gid 7777777,
size 00000000374,
seconds 17626765756
test-png.cbt: Comic Book Archive,
tar archive, 1st image
0001.png
test4digit.tar: POSIX tar archive (GNU), file
2712.txt, mode 000644 ,
size 00000204500,
seconds 13220741303
test_data.tar: POSIX tar archive (GNU), file
0000000000000000.empty.br, mode 0000600,
uid 0423055, gid 0257523,
size 00000000001,
seconds 13266421766,
user eustas, group primarygroup
win10iso-gnu.tar: POSIX tar archive (GNU), file
m/vm/14393.0.160715-1616.
RS1_RELEASE_CLIENTENTERPRISE_S_EVAL,
mode 0000644,
uid 0002464, gid 0001143,
size 0xd72db800,
seconds 13031057704,
user joerg, group Administratoren
I hope my diff file can be applied in future version of file
utility.
There exist still some other file formats with CBT suffix. I will
try to handle this in a future session.
With best wishes,
Jörg Jenderek
- --
Jörg Jenderek
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYuHEYAAKCRCv8rHJQhrU
1kwSAKDfwwjm/RhQycZJXwBPbV9XGPWwGQCgx5Ld5nthG93biSG3g/PygDy8p8Q=
=VhQO
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-comicbook-cbt.txt.gz
Type: application/x-gzip
Size: 1052 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220728/b0d11f14/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: droid-comicbook-cbt.csv.gz
Type: application/x-gzip
Size: 709 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220728/b0d11f14/attachment-0001.bin>
-------------- next part --------------
--- file-5.42/magic/Magdir/archive.old 2022-05-28 22:13:23.000000000 +0200
+++ file-5.42/magic/Magdir/archive 2022-07-27 22:42:40.851865900 +0200
@@ -25,7 +25,16 @@
>>>>>>155 ubyte&0xDF =0
# space or ascii digit 0 at start of check sum
>>>>>>>148 ubyte&0xEF =0x20
->>>>>>>>0 use tar-file
+# FOR DEBUGGING:
+#>>>>>>>>0 regex \^[0-9]{2,4}[.](png|jpg|jpeg|tif|tiff|gif|bmp) NAME "%s"
+# check for 1st image main name with digits used for sorting
+# and for name extension case insensitive like: PNG JPG JPEG TIF TIFF GIF BMP
+>>>>>>>>0 regex \^[0-9]{2,4}[.](png|jpg|jpeg|tif|tiff|gif|bmp)
+#foo
+>>>>>>>>>0 use tar-cbt
+# if 1st member name without digits and without used image suffix then it is a TAR archive
+>>>>>>>>0 default x
+>>>>>>>>>0 use tar-file
# minimal check and then display tar archive information which can also be
# embedded inside others like Android Backup, Clam AntiVirus database
0 name tar-file
@@ -146,6 +155,19 @@
>>508 default x
# padding[255] in old tar sometimes comment field
>>>257 string >\0 \b, comment: %-.40s
+# Summary: Comic Book Archive *.CBT with TAR format
+# URL: https://en.wikipedia.org/wiki/Comic_book_archive
+# http://fileformats.archiveteam.org/wiki/Comic_Book_Archive
+# Note: there exist also RAR, ZIP, ACE and 7Z packed variants
+0 name tar-cbt
+>0 string x Comic Book archive, tar archive
+#!:mime application/x-tar
+!:mime application/vnd.comicbook
+#!:mime application/vnd.comicbook+tar
+!:ext cbt
+# name[100] probably like: 19.jpg 0001.png 0002.png
+# or maybe like ComicInfo.xml
+>0 string >\0 \b, 1st image %-.60s
# Incremental snapshot gnu-tar format from:
# https://www.gnu.org/software/tar/manual/html_node/Snapshot-Files.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.42-archive-comicbook-cbt.diff.sig
Type: application/octet-stream
Size: 1117 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220728/b0d11f14/attachment.obj>
More information about the File
mailing list