[File] BigTIFF: Indirect offset, 64-bit value not supported?

David Konerding dakoner at gmail.com
Sat Apr 13 21:35:45 UTC 2024


>From what I can tell, the files at
https://www.awaresystems.be/imaging/tiff/bigtiff.html (see the "Download"
link at the bottom) have examples where there are 64-bit offsets past 2GB,
for example in the Description entry.

For what we want, I think it doesn't make sense to ask file/magic to
support 64-bit offsets.  We had been thinking of using libmagic to extract
a bunch of metadata (in the way that it outputs width, height, bps, etc for
tiff and png) but I think it makes more sense to use the file/magic to
identify it as a bigtiff and then hand it to our tiff parser which supports
these large offsets.


On Sat, Apr 13, 2024 at 9:55 AM Christos Zoulas <christos at zoulas.com> wrote:

>
>
> On Mar 26, 2024, at 5:22 PM, David Konerding <dakoner at gmail.com> wrote:
>
> Hi,
>
> I am trying to write a rule to extract more info from BigTIFF.  Currently,
> TIFF files extract directory entries and output metadata, while BigTIFF
> only reports the file type and endian.
>
> Working from this page, http://bigtiff.org/ I am trying to read the
> offset to the first directory entry; in TIFF, this is a short (16-bit),
> while in BigTIFF, it's a quad (64-bit) to support files with very large
> offsets.
>
>
> As such, I am trying to write this continuation:
>
> >>>(8.Q) use \^bigtiff_ifd
>
> Which IIUC is saying "starting at file offset 8, read a bequad (64 bit)
> and then recursively call the named magic bigtiff_ifd (which is slightly
> different from tiff_ifd).
>
>
> When I try this on my test file (a big-endian BigTIFF), I get this debug
> error:
>
> 10: >>>> 8(bequad,&0), use,='^bigtiff_ifd',""]
> lhs/off overflow 28956860354 0
>
>
> If I understand the code in do_ops correctly (
> https://github.com/file/file/blob/master/src/softmagic.c#L1465)
>
> the values for lhs and off are compared to UINT_MAX and INT_MIN, and a
> failure is reported if the value is too large.
>
> On my 64-bit system, UINT_MAX seems to be based on 32-bit integer, which
> causes the overflow error when compared against a 64-bit value.
>
>
> Before I proceed, wanted a sanity check here: are 64-bit offsets larger
> than 2**32 considered invalid by file?
>
> Yes, If you look in
> https://github.com/file/file/blob/master/src/file.h#L360 offsets are 32
> bits right now.
> It would not be too hard to change everything to be 64 bits, but the
> question is: is it really looking for
> magic data past 2GB and it is correct (the data is there)? So far I have
> not found the need to change
> the code to support 64 bit offsets...
>
> christos
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.astron.com/pipermail/file/attachments/20240413/29cf22d3/attachment.htm>


More information about the File mailing list