[File] charset=binary not registers

John-Mark Gurney jmg at funkthat.com
Fri Aug 26 21:09:39 UTC 2022


Hello,

I happen to be looking at using file for mime-type identification,
and I noticed a lot of charset=binary.  I did a bit of research, and
binary is not a valid charset, but it is a valid transfer encoding
[RFC2045 sec 6.1].  That is a separate concept/header field than
charset.

RFC2046 sec 4.1.2 talks about it, and it does say it can be used w/
other mime-types than just text:
   Other media types than subtypes of "text" might choose to employ the
   charset parameter as defined here, but with the CRLF/line break
   restriction removed.  Therefore, all character sets that conform to
   the general definition of "character set" in RFC 2045 can be
   registered for MIME use.

My suggestion it to change the code to not print a charset unless it
is expclicitly defined.  That is drop the default, and only print the
charset= part if one is set by file_encoding.

(Looks like file_encoding ALSO by default sets it to binary, and would
need to be fixed as well.)

Another option would be to only do the charset detection when the
mime-type is text.

Thoughts and comments?

[RFC2045 sec 6.1] https://www.rfc-editor.org/rfc/rfc2045#section-6.1
[RFC2046 sec 4.1.2] https://www.rfc-editor.org/rfc/rfc2046#section-4.1.2
[IANA Charset] https://www.iana.org/assignments/character-sets/character-sets.xhtml

(note I am not subscribed to the mailing list, so please keep me cc'd.)

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."


More information about the File mailing list