[File] [PATCH] Magdir/jpeg,images,animation for "unusual" JPEG; extensions; duplicates

Christos Zoulas christos at zoulas.com
Fri Jun 17 18:06:14 UTC 2022


Committed, thanks!

christos

> On Jun 3, 2022, at 8:55 AM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hello,
> 
> some days ago i looked at media on my card of my digital camera. The
> camera is a Canon Digital Ixus 300. Inside the DCIM directory the
> recorded media are stored. The Pictures are stored as JPEG images
> with names like IMG_0401.JPG, IMG_0402.JPG, etc. The movies are
> stored as AVI videos with names like MVI_0441.AVI, MVI_0442.AVI. Now
> comes the strange things. For every video there exist a file with
> same main name but with 3 byte "THM" name extension. The same kind
> is find as MOV00020.thm inside FFmpeg source. Unfortunately i was
> able to generate some working magic lines, because after indirect
> call i get concatenation of wanted extension with "tif,tiff". So i
> concentrate here at the moment of other unusual JPEG images.
> 
> When running file command version 5.41 with -k option on such images
> i get an output looking quite "good" like:
> 
> 2021-08_totocaca.jxl:         JPEG XL codestream
> Bretagne1_1.j2k:              JPEG 2000 codestream
> Cevennes2.jp2:                JPEG 2000 Part 1 (JP2)
> 			      JPEG 2000 image
> FDOSFISH.HSI:                 JPEG image data, HSI proprietary
> FLOWER.wdp:                   JPEG-XR Image, hard tiling,
> 			      spatial xform=TL,
> 			      short header, 2592x3904, bitdepth=5-6-5
> 			      , colorfmt=YONLY
> 			      JPEG-XR
> IMG_20200308_194050.jxl:      JPEG XL container
> MOV00020.thm:                 JPEG image data,
> 			      Exif standard: [TIFF image data,
> 			      little-endian,
> 			      direntries=9, manufacturer=SONY,
> 			      model=DSC-T100
> Speedway.mj2:                 JPEG 2000 Part 3 (MJ2)
> 			      JPEG 2000 image
> abydos.jxr:                   JPEG-XR Image,
> 			      spatial xform=TL,
> 			      short header, 800x600,
> 			      bitdepth=16-SIGNED
> 			      , colorfmt=YONLY
> 			      JPEG-XR
> balloon.j2c:                  JPEG 2000 codestream
> balloon.jp2:                  JPEG 2000 Part 1 (JP2)
> 			      JPEG 2000 image
> balloon.jpf:                  JPEG 2000 Part 2 (JPX)
> 			      JPEG 2000 image
> balloon.jpm:                  JPEG 2000 Part 6 (JPM)
> 			      JPEG 2000 image
> fmt-590-signature-id-931.wdp: data
> 
> When running with -i option things are now a little bit worse like:
> 2021-08_totocaca.jxl:         image/jxl
> Bretagne1_1.j2k:              application/octet-stream
> Cevennes2.jp2:                image/jp2
> FDOSFISH.HSI:                 application/octet-stream
> FLOWER.wdp:                   application/octet-stream
> IMG_20200308_194050.jxl:      image/jxl
> MOV00020.thm:                 image/jpeg
> Speedway.mj2:                 video/mj2
> abydos.jxr:                   application/octet-stream
> balloon.j2c:                  application/octet-stream
> balloon.jp2:                  image/jp2
> balloon.jpf:                  image/jpx
> balloon.jpm:                  image/jpm
> fmt-590-signature-id-931.wdp: application/octet-stream
> 
> At first glance this seem to be not so bad, but when running with
> option --extension option i get following more worse looking output
> like:
> 
> 2021-08_totocaca.jxl:         jxl
> Bretagne1_1.j2k:              ???
> Cevennes2.jp2:                ???
> FDOSFISH.HSI:                 ???
> FLOWER.wdp:                   ???
> IMG_20200308_194050.jxl:      jxl
> MOV00020.thm:                 jpeg/jpg/jpe/jfif
> Speedway.mj2:                 ???
> abydos.jxr:                   ???
> balloon.j2c:                  ???
> balloon.jp2:                  ???
> balloon.jpf:                  ???
> balloon.jpm:                  ???
> fmt-590-signature-id-931.wdp: ???
> 
> 
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). This identifies
> J2C and J2K samples as "JPEG-2000 Code Stream bitmap" by
> bitmap-jpc.trid.xml. The JXR and WDP samples are described as "JPEG
> XR bitmap" by bitmap-wmp.trid.xml. The JXL samples are described as
> "JPEG XL bitmap" by bitmap-jxl.trid.xml or bitmap-jxl-iso.trid.xml.
> The JP2 samples are described as "JPEG 2000 bitmap" by
> bitmap-jpeg2k.trid.xml (See appended trid-v-jpeg.txt.gz).
> 
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/). This
> identifies the same as TrID with some exceptions. The HSI variant is
> not recognised. And also the JPEG 2000 codestream variants
> (jpc/j2c/j2k) are not recognized (See appended droid-jpeg.csv.gz)
> 
> Luckily TrID with -v option shows a related URL and used file name
> extensions. That informations for JPEG 2000 are expressed by comment
> lines inside Magdir/jpeg like:
> # URL:		http://fileformats.archiveteam.org/
> #		wiki/JPEG_2000_codestream
> # Reference:	http://mark0.net/download/triddefs_xml.7z
> #		defs/b/bitmap-jpc.trid.xml
> # Note:         called by TrID "JPEG-2000 Code Stream bitmap"
> 
> The detection of JPEG 2000 happens inside Magdir/jpeg by lines like:
> 0	belong		0xff4fff51	JPEG 2000 codestream
> 45	beshort		0xff52
> This now becomes like:
> 0	belong		0xff4fff51	JPEG 2000 codestream
> # value like: 0701h FF50h
> #>45	ubeshort	x	\b, at 45 %#4.4x
> !:mime	image/x-jp2-codestream
> !:ext	jpc/j2c/j2k
> #45	beshort		0xff52
> There was an entry for offset 45 with test for value 0xff52. For my
> inspected samples i get there value 0701h or FF50h. So this entry
> does not make sense for me. So i changed the second line to a comment
> line.
> 
> That informations for JPEG XR are expressed by comment lines inside
> Magdir/jpeg like:
> # URL:		http://fileformats.archiveteam.org/wiki/JPEG_XR
> # Reference:	https://www.itu.int/rec/T-REC-T.832
> #		http://mark0.net/download/triddefs_xml.7z
> #		defs/b/bitmap-wmp.trid.xml
> The detection of JPEG 2000 happens inside Magdir/jpeg by lines like:
> 0	string		\x49\x49\xbc
>> 3	byte		1
>>> 4	lelong%2	0	JPEG-XR
> !:mime	image/jxr
> !:ext	jxr
> By second line test of FILE_VERSION_ID. That shall be equal to 1.
> Other values are reserved for future use. By third line test of
> FIRST_IFD_OFFSET. That shall be an integer multiple of 2. So by this
> test skip DROID fmt-590-signature-id-931.wdp which is misidentified b
> y
> TrID and DROID as valid image. In older documents mime type
> image/vnd.ms-photo was used. Beside suffix JXR also older WDP is
> used. Also HDP as third extension is mentioned, but i myself found no
> examples. So these are expressed by line like:
> !:ext	jxr/wdp/hdp
> And i found some hazarded hints for two further extensions WMP and JH
> C.
> 
> In Magdir/images is also an entry for "JPEG-XR Image". So i moved
> that part and merged it. This now gives additional lines starting lik
> e:
>> 90	bequad		0x574D50484F544F00
>>> 98	byte&0x08	=0x08			\b, hard tiling
> ...
>>>> 101	beshort&0xf0	0x80			\bRGBE
>>>> 101	beshort&0xf0	>0x80			\b(reserved %#x)
> The mentioned URL to example FLOWER.wdp does not exist any more. So i
> replace it with an archived version. I do not validate the shown
> information, but i verified partly information by XnView command line
> tools like:
> 	nconvert -info abydos.jxr FLOWER.wdp
> 
> That informations for JPEG XL are expressed by comment lines inside
> Magdir/jpeg like:
> # URL:		http://fileformats.archiveteam.org/wiki/JPEG_XL
> # Reference:	http://mark0.net/download/triddefs_xml.7z
> #		defs/b/bitmap-jxl.trid.xml
> #		defs/b/bitmap-jxl-iso.trid.xml
> The detection of these images happened inside Magdir/jpeg by lines
> like:
> 0	string	\xff\x0a	JPEG XL codestream
> !:mime  image/jxl
> !:ext jxl
> But i found no such official registered mime type image/jxl. So i
> replaced it with a user defined one. So lines now becomes like:
> 0	string	\xff\x0a	JPEG XL codestream
> !:mime	image/x-jxl
> !:ext jxl
> 
> The detection of JPEG images starts inside Magdir/jpeg by lines like:
> 0 string \x00\x00\x00\x0C\x6A\x50\x20\x20\x0D\x0A\x87\x0A JPEG 2000
> When looking in Magdir/animation we see in principal the same is done
> in Magdir/animation. Here the magic part \x6A\x50 at offset 4 is
> considered as string. This looks like:
> 4	string/W	jP		JPEG 2000 image
> !:mime	image/jp2
> So i delete the part done via Magdir/animation.
> In Magdir/jpeg afterwards sub classification is done. This looks like
> :
>> 20	string		\x6a\x70\x32\x20	Part 1 (JP2)
> !:mime	image/jp2
> After mime type now show also file name extension. This now looks lik
> e:
>> 20	string		\x6a\x70\x32\x20	Part 1 (JP2)
> !:mime	image/jp2
> !:ext	jp2
> 
> The informations for JP2 images are expressed by comment lines inside
> Magdir/jpeg before like:
> # URL:		http://fileformats.archiveteam.org/wiki/JP2
> # Reference:	http://mark0.net/download/triddefs_xml.7z
> #		defs/b/bitmap-jpeg2k.trid.xml
> 
> After applying the above mentioned modifications by patches
> file-5.41-jpeg-unusual.diff, file-5.41-images-jpeg.diff and
> file-5.41-animation-jpeg.diff then all my inspected "unusual" jpeg
> images are now still described as before but now with more or
> corrected mime types and duplicate vanish. This with -i option now
> looks like:
> 
> 2021-08_totocaca.jxl:         image/x-jxl
> Bretagne1_1.j2k:              image/x-jp2-codestream
> Cevennes2.jp2:                image/jp2
> FDOSFISH.HSI:                 image/x-hsi
> FLOWER.wdp:                   image/jxr
> IMG_20200308_194050.jxl:      image/x-jxl
> MOV00020.thm:                 image/jpeg
> Speedway.mj2:                 video/mj2
> abydos.jxr:                   image/jxr
> balloon.j2c:                  image/x-jp2-codestream
> balloon.jp2:                  image/jp2
> balloon.jpf:                  image/jpx
> balloon.jpm:                  image/jpm
> fmt-590-signature-id-931.wdp: application/octet-stream
> 
> With --extension option this now looks like:
> 
> 2021-08_totocaca.jxl:         jxl
> Bretagne1_1.j2k:              jpc/j2c/j2k
> Cevennes2.jp2:                jp2
> FDOSFISH.HSI:                 hsi/jpg
> FLOWER.wdp:                   jxr/wdp/hdp
> IMG_20200308_194050.jxl:      jxl
> MOV00020.thm:                 jpeg/jpg/jpe/jfif
> Speedway.mj2:                 mj2/mjp2
> abydos.jxr:                   jxr/wdp/hdp
> balloon.j2c:                  jpc/j2c/j2k
> balloon.jp2:                  jp2
> balloon.jpf:                  jpf/jpx
> balloon.jpm:                  jpm
> fmt-590-signature-id-931.wdp: ???
> 
> I hope my diff files can be applied in future version of file
> utility.
> 
> With best wishes,
> Jörg Jenderek
> - --
> Jörg Jenderek
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
> 
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYpoEzQAKCRCv8rHJQhrU
> 1h/uAKDS8rR9igqKYKPvcO6Ytn+TDDJ0YQCgyxC0RQybOshpAMdfQOwPdrQpwr8=
> =lnkt
> -----END PGP SIGNATURE-----
> <file-5_41-images-jpeg_diff.DEFANGED-15138><file-5_41-images-jpeg_diff_sig.DEFANGED-15139><file-5_41-animation-jpeg_diff.DEFANGED-15140><file-5_41-animation-jpeg_diff_sig.DEFANGED-15141><file-5_41-jpeg-unusual_diff.DEFANGED-15142><file-5_41-jpeg-unusual_diff_sig.DEFANGED-15143><droid-jpeg.csv.gz><trid-v-jpeg.txt.gz>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20220617/84ef3e0a/attachment-0001.asc>


More information about the File mailing list