<div dir="ltr">If the encoding of pattern files is documented somewhere, I'd like to see it. I couldn't find anything when I researched it for Mgchkj (<a href="https://github.com/jsummers/mgchkj">https://github.com/jsummers/mgchkj</a>). There's no single encoding that is valid for all the current patterns and comments.<br><br>Of course, you can use 'file' itself to tell you which files are not valid UTF-8. Mgchkj is more precise, and it does warn about the issue you're reporting:<br><br>filesystems:2610: Line has non-ASCII characters (probably not UTF-8) [# From: Thomas Wei�schuh <<a href="mailto:thomas@t-8ch.de">thomas@t-8ch.de</a>>]<br>firmware:177: Line has non-ASCII characters (probably not UTF-8) [# Note:  called "Intel Hexadecimal object format" by TrID, "Intel� hexadecimal object file" on Linux]<br>images:647: Line has non-ASCII characters (probably not UTF-8) [# binary data variant with non ASCII text characters like Control-A or �C in thermostat.fig]<br>msdos:2526: Line has non-ASCII characters (probably not UTF-8) [# 1st member name like: "Class Notes.one" "test-onenote.one" "Open Notebook.onetoc2" "Editor �ffnen.onetoc2"]<br><br>If you use the "-w3" option, it also warns about (non-ASCII) UTF-8. I'll remove that warning if it's documented as being correct.<br><br>There is probably a way to configure your Python input method to handle errors differently, but I don't know enough to help with that.</div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Tue, Feb 11, 2025 at 7:23 PM Sudarshan S Chawathe <<a href="mailto:chaw@eip10.org">chaw@eip10.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">In the file Magdir/msdos, there seems to be a strange byte at offset<br>

108406.  Examining it in emacs gives:<br>

<br>

  Char: \326 (4194262, #o17777726, #x3fffd6, raw-byte) point=108406 of<br>

  127680 (85%) column=93<br>

<br>

[I replaced the actual byte with the string "\326" above to avoid<br>

potential email problems.]<br>

<br>

Trying to read that line using 'input' in python3 gives:<br>

<br>

  UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd6 in position<br>

  1909: invalid continuation byte<br>

<br>

Is that byte a typo of some sort, or should that file be read using a<br>

different text encoding (or method)?<br>

<br>

Regards,<br>

<br>

-chaw<br>

-- <br>

File mailing list<br>

<a href="mailto:File@astron.com" target="_blank">File@astron.com</a><br>

<a href="https://mailman.astron.com/mailman/listinfo/file" rel="noreferrer" target="_blank">https://mailman.astron.com/mailman/listinfo/file</a><br>

</blockquote></div><div><br clear="all"></div><div><br></div><span class="gmail_signature_prefix">-- </span><br><div dir="ltr" class="gmail_signature"><div dir="ltr">Jason Summers<div><br></div></div></div>