[File] Python file misdetection
    Steve Grubb 
    sgrubb at redhat.com
       
    Thu Dec  1 18:39:11 UTC 2022
    
    
  
On Thursday, December 1, 2022 9:44:39 AM EST Christos Zoulas wrote:
> Can you please try 5.43 or HEAD?
I did but it also misdetects the file type. I am trying to see if I can find a 
reason for the misdetection. When I test, I am using:
LD_LIBRARY=src/.libs/ src/.libs/file --mime-type -m magic/magic.mgc ~/test.py
To make sure it uses the repo's copies and not the system's copies. What I'm 
seeing with -k -l is:
Strength =  63 at 96: Objective-C source text [text/x-objective-c]
Strength =  63 at 232: Python script text executable [text/x-script.python]
When I look at the debug output, I see only 2 found statements:
40: > 0 search/8192,!p,""]
search: [#!/usr/bin/python3\n\n#import os\n\nos.system("ls")\n\n] for [p] 
found
0 != 0 = 0
bb=[0x1459da0,49,0], 0 [b=0x1459da0,49,0], [o=0, c=0]
mget(type=20, flag=0x40, offset=0, o=0, nbytes=49, il=0, nc=0)
96: > 0 search/8192,=#import,""]
search: [#!/usr/bin/python3\n\n#import os\n\nos.system("ls")\n\n] for 
[#import] found
0 == 0 = 1
bb=[0x1459da0,49,0], 0 [b=0x1459da0,49,0], [o=0, c=1]
I don't see any matches for python. Is there any other data that I could 
gather to help figure out what's happening?
-Steve
> > On Nov 30, 2022, at 4:40 PM, Steve Grubb <sgrubb at redhat.com> wrote:
> > 
> > Hello,
> > 
> > On Wednesday, November 30, 2022 3:30:38 PM EST Christos Zoulas wrote:
> >>> On Nov 29, 2022, at 5:37 PM, Steve Grubb <sgrubb at redhat.com> wrote:
> >>> I run across a case where python files get misdetected when an import
> >>> statement is commented out. For example:
> >>> 
> >>> #!/bin/sh
> >>> echo DEFANGED.1
> >>> exit
> >>> #!/usr/bin/python3
> >>> import os
> >>> os.system("ls")
> >>> 
> >>> file --mime-type example.py
> >>> example.py: text/x-script.python
> >>> 
> >>> #!/usr/bin/python3
> >>> #import os
> >>> os.system("ls")
> >>> 
> >>> file --mime-type example.py
> >>> example.py: text/x-objective-c
> >>> 
> >>> It matches Objective-C with a strength of 25, where
> >>> #!\040/usr/bin/python
> >>> has a strength of 15. It would seem very plausible for someone to
> >>> occassionally comment out an import statement. I'm wondering why an
> >>> Objective-C construct would be stronger than a python shebang (which
> >>> should be conclusive)? Not sure which of the two to adjust.
> >> 
> >> What version of file is that? I can't reproduce it.
> > 
> > I can reproduce this with 5.39, 5.41, and 5.42. All of them on Fedora 36
> > or rawhide. It appears to be finding the #import statement and matching
> > that with more weight than the shebang.
> > 
> > -Steve
    
    
More information about the File
mailing list