[File] Gemtext file badly recognized as HTML
ploumfile at offpunk.net
ploumfile at offpunk.net
Sat Jan 31 16:42:56 UTC 2026
Le 26 jan 31 10:47, Christos Zoulas a écrit :
>Running with -d says:
>
>*unknown*, 115: > 0 search/wct/4096,=<a href=,"HTML document text"]
>search: [# 2026-01-30 Locking the gate\n\nThe last few days have once again been pretty stressful as the scrape...] for [<a href=] found
>0 == 0 = 1 strength=68
>[try ascmagic 1]
>/Users/christos/bad_mime.DEFANGED-16: HTML document, Unicode text, UTF-8 text, with very long lines (440)
I overlooked this: there’s indeed a single <a href= in the middle of the
document, in a quoted text (the document describe a webserver
configuration)
So this is definitely a false-positive for HTML.
Is this a bug that should be fixed or not?
There are multiple hints that this document is not HTML:
1. No opening "<" (in most HTML document, the first non-empty
characters should probably be "<")
2. No <html> (although I’m not sure it is mandatory)
3. No <title> tag (that one in mandatory according to RFC1866
(https://www.ietf.org/rfc/rfc1866.txt )
What’s your opinion on this?
Ploum
More information about the File
mailing list