[File] Question regarding python bytecode mimetypes

Mircea GLIGA mgliga at bitdefender.com
Fri Jul 31 08:16:22 UTC 2020


Hello all

I've been debugging a script which tries to identify text files based on
their mimetypes in order to do some text replacements.[1]
This used to work in the past and now, on a Manjaro machine with
`file-5.39` the script incorrectly identifies Python byte code files,
*.pyc, as textfiles. So after a text replacement they are rendered
useless.

It seems it's related to this commit: 
https://github.com/file/file/commit/eb373e431ccfeedfbcf497e4da07571d43bdb9f2 

My question is why are byte code files considered of type "text"?
In the end, they are binary files, not text files.
Is this considered normal behavior? Or a bug?

Comparing the output of two different `file` versions:

     $ file --version
     file-5.39
     $ file -b --mime-type numbers.pyc
     text/x-bytecode.python

On a Debian machine:

     $ file --version
     file-5.35
     magic file from /etc/magic:/usr/share/misc/magic
     $ file -b --mime-type numbers.pyc
     application/octet-stream

Thanks and regards
Mircea

[1] 
https://git.buildroot.net/buildroot/tree/support/misc/relocate-sdk.sh?h=2020.02.4#n39



More information about the File mailing list