[File] Gemtext file badly recognized as HTML

Christos Zoulas christos at zoulas.com
Fri Feb 6 17:09:04 UTC 2026



> On Jan 31, 2026, at 12:25 PM, ploumfile at offpunk.net wrote:
> 
> Le 26 jan 31 12:09, Christos Zoulas a écrit :
>> Yes, we can require that <tile> exists for html documents, but that will break other uses for
>> html embeddings for example. Anyway it is just heuristics, and if you try to fix one document,
>> you might break another.
> 
> Indeed. That’s why I’m really curious about file philosophy on this. At which point is a false-positive considered a bug? To which degree should a file be identified even if it doesn’t follow closely a given standard?
> 
> I’m really curious about it.
> 
> I’m also willing to help on identifying gemtext file but I’ve been told that there’s no magic number for gemtext. Is there a way to contribute one, one way on another?
> 

Consider the following text file:

hello mom
<a href="https://www.google.com">link</a>

Is this an html file?

Well, if you give it to any browser and click on link, it will take you to google. So the browsers treat it as an html file.
Should file(1) say it is a text file because it is not properly structured? The file(1) program tries to give results that
are expected and useful in most cases. 

Best,

christos


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20260206/5f6009cd/attachment.asc>


More information about the File mailing list