[File] Erroneous byte in Magdir/msdos?

Jason Summers jason1 at pobox.com
Wed Feb 12 14:49:32 UTC 2025


If the encoding of pattern files is documented somewhere, I'd like to see
it. I couldn't find anything when I researched it for Mgchkj (
https://github.com/jsummers/mgchkj). There's no single encoding that is
valid for all the current patterns and comments.

Of course, you can use 'file' itself to tell you which files are not valid
UTF-8. Mgchkj is more precise, and it does warn about the issue you're
reporting:

filesystems:2610: Line has non-ASCII characters (probably not UTF-8) [#
From: Thomas Wei�schuh <thomas at t-8ch.de>]
firmware:177: Line has non-ASCII characters (probably not UTF-8) [# Note:
 called "Intel Hexadecimal object format" by TrID, "Intel� hexadecimal
object file" on Linux]
images:647: Line has non-ASCII characters (probably not UTF-8) [# binary
data variant with non ASCII text characters like Control-A or �C in
thermostat.fig]
msdos:2526: Line has non-ASCII characters (probably not UTF-8) [# 1st
member name like: "Class Notes.one" "test-onenote.one" "Open
Notebook.onetoc2" "Editor �ffnen.onetoc2"]

If you use the "-w3" option, it also warns about (non-ASCII) UTF-8. I'll
remove that warning if it's documented as being correct.

There is probably a way to configure your Python input method to handle
errors differently, but I don't know enough to help with that.

On Tue, Feb 11, 2025 at 7:23 PM Sudarshan S Chawathe <chaw at eip10.org> wrote:

> In the file Magdir/msdos, there seems to be a strange byte at offset
> 108406.  Examining it in emacs gives:
>
>   Char: \326 (4194262, #o17777726, #x3fffd6, raw-byte) point=108406 of
>   127680 (85%) column=93
>
> [I replaced the actual byte with the string "\326" above to avoid
> potential email problems.]
>
> Trying to read that line using 'input' in python3 gives:
>
>   UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd6 in position
>   1909: invalid continuation byte
>
> Is that byte a typo of some sort, or should that file be read using a
> different text encoding (or method)?
>
> Regards,
>
> -chaw
> --
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
>


-- 
Jason Summers
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.astron.com/pipermail/file/attachments/20250212/614771b4/attachment.htm>


More information about the File mailing list