[File] Python file misdetection

Steve Grubb sgrubb at redhat.com
Thu Dec 1 21:32:16 UTC 2022


On Thursday, December 1, 2022 1:50:08 PM EST Christos Zoulas wrote:
> It is probably using the system libraries, you probably want
> LD_LIBRARY_PATH= But the shell script that is in src should work just
> fine:
> 
> src/file -m magic/magic.mgc --mime-type ~/test.py

What I think is happening is that they are tying in strength and in my case 
objective comes first. So, I looked into Objective C syntax and found it's 
syntax is something like this:

#import "ClassName.h"

@interface ClassName ( CategoryName )

// method declarations

@end

What I notice is that you probably would not have #import without a double 
quote to enclose the file name. I modified the magic detection as follows:

0       search/8192/W   #import\040\042
0       regex   \^#import[[:space:]]*\"               Objective-C source text

That finally sorted out the problem. I then put the code above in a obj.c file 
and tested against it and it's correctly detected as Objective C. I also took 
away the W to test the following regex. It also matches.

-Steve


> > On Dec 1, 2022, at 1:39 PM, Steve Grubb <sgrubb at redhat.com> wrote:
> > 
> > On Thursday, December 1, 2022 9:44:39 AM EST Christos Zoulas wrote:
> >> Can you please try 5.43 or HEAD?
> > 
> > I did but it also misdetects the file type. I am trying to see if I can
> > find a reason for the misdetection. When I test, I am using:
> > 
> > LD_LIBRARY=src/.libs/ src/.libs/file --mime-type -m magic/magic.mgc
> > ~/test.py
> > 
> > To make sure it uses the repo's copies and not the system's copies. What
> > I'm seeing with -k -l is:
> > 
> > Strength =  63 at 96: Objective-C source text [text/x-objective-c]
> > Strength =  63 at 232: Python script text executable [text/x-script.python]
> > 
> > When I look at the debug output, I see only 2 found statements:
> > 
> > 40: > 0 search/8192,!p,""]
> > search: [#!/usr/bin/python3\n\n#import os\n\nos.system("ls")\n\n] for [p]
> > found
> > 0 != 0 = 0
> > bb=[0x1459da0,49,0], 0 [b=0x1459da0,49,0], [o=0, c=0]
> > mget(type=20, flag=0x40, offset=0, o=0, nbytes=49, il=0, nc=0)
> > 
> > 96: > 0 search/8192,=#import,""]
> > search: [#!/usr/bin/python3\n\n#import os\n\nos.system("ls")\n\n] for
> > [#import] found
> > 0 == 0 = 1
> > bb=[0x1459da0,49,0], 0 [b=0x1459da0,49,0], [o=0, c=1]
> > 
> > I don't see any matches for python. Is there any other data that I could
> > gather to help figure out what's happening?
> > 
> > -Steve
> > 
> >>> On Nov 30, 2022, at 4:40 PM, Steve Grubb <sgrubb at redhat.com> wrote:
> >>> 
> >>> Hello,
> >>> 
> >>> On Wednesday, November 30, 2022 3:30:38 PM EST Christos Zoulas wrote:
> >>>>> On Nov 29, 2022, at 5:37 PM, Steve Grubb <sgrubb at redhat.com> wrote:
> >>>>> I run across a case where python files get misdetected when an import
> >>>>> statement is commented out. For example:
> >>>>> 
> >>>>> #!/bin/sh
> >>>>> echo DEFANGED.1
> >>>>> exit
> >>>>> #!/usr/bin/python3
> >>>>> import os
> >>>>> os.system("ls")
> >>>>> 
> >>>>> file --mime-type example.py
> >>>>> example.py: text/x-script.python
> >>>>> 
> >>>>> #!/usr/bin/python3
> >>>>> #import os
> >>>>> os.system("ls")
> >>>>> 
> >>>>> file --mime-type example.py
> >>>>> example.py: text/x-objective-c
> >>>>> 
> >>>>> It matches Objective-C with a strength of 25, where
> >>>>> #!\040/usr/bin/python
> >>>>> has a strength of 15. It would seem very plausible for someone to
> >>>>> occassionally comment out an import statement. I'm wondering why an
> >>>>> Objective-C construct would be stronger than a python shebang (which
> >>>>> should be conclusive)? Not sure which of the two to adjust.
> >>>> 
> >>>> What version of file is that? I can't reproduce it.
> >>> 
> >>> I can reproduce this with 5.39, 5.41, and 5.42. All of them on Fedora
> >>> 36
> >>> or rawhide. It appears to be finding the #import statement and matching
> >>> that with more weight than the shebang.
> >>> 
> >>> -Steve






More information about the File mailing list