[File] Python file misdetection
Steve Grubb
sgrubb at redhat.com
Thu Dec 1 21:32:16 UTC 2022
On Thursday, December 1, 2022 1:50:08 PM EST Christos Zoulas wrote:
> It is probably using the system libraries, you probably want
> LD_LIBRARY_PATH= But the shell script that is in src should work just
> fine:
>
> src/file -m magic/magic.mgc --mime-type ~/test.py
What I think is happening is that they are tying in strength and in my case
objective comes first. So, I looked into Objective C syntax and found it's
syntax is something like this:
#import "ClassName.h"
@interface ClassName ( CategoryName )
// method declarations
@end
What I notice is that you probably would not have #import without a double
quote to enclose the file name. I modified the magic detection as follows:
0 search/8192/W #import\040\042
0 regex \^#import[[:space:]]*\" Objective-C source text
That finally sorted out the problem. I then put the code above in a obj.c file
and tested against it and it's correctly detected as Objective C. I also took
away the W to test the following regex. It also matches.
-Steve
> > On Dec 1, 2022, at 1:39 PM, Steve Grubb <sgrubb at redhat.com> wrote:
> >
> > On Thursday, December 1, 2022 9:44:39 AM EST Christos Zoulas wrote:
> >> Can you please try 5.43 or HEAD?
> >
> > I did but it also misdetects the file type. I am trying to see if I can
> > find a reason for the misdetection. When I test, I am using:
> >
> > LD_LIBRARY=src/.libs/ src/.libs/file --mime-type -m magic/magic.mgc
> > ~/test.py
> >
> > To make sure it uses the repo's copies and not the system's copies. What
> > I'm seeing with -k -l is:
> >
> > Strength = 63 at 96: Objective-C source text [text/x-objective-c]
> > Strength = 63 at 232: Python script text executable [text/x-script.python]
> >
> > When I look at the debug output, I see only 2 found statements:
> >
> > 40: > 0 search/8192,!p,""]
> > search: [#!/usr/bin/python3\n\n#import os\n\nos.system("ls")\n\n] for [p]
> > found
> > 0 != 0 = 0
> > bb=[0x1459da0,49,0], 0 [b=0x1459da0,49,0], [o=0, c=0]
> > mget(type=20, flag=0x40, offset=0, o=0, nbytes=49, il=0, nc=0)
> >
> > 96: > 0 search/8192,=#import,""]
> > search: [#!/usr/bin/python3\n\n#import os\n\nos.system("ls")\n\n] for
> > [#import] found
> > 0 == 0 = 1
> > bb=[0x1459da0,49,0], 0 [b=0x1459da0,49,0], [o=0, c=1]
> >
> > I don't see any matches for python. Is there any other data that I could
> > gather to help figure out what's happening?
> >
> > -Steve
> >
> >>> On Nov 30, 2022, at 4:40 PM, Steve Grubb <sgrubb at redhat.com> wrote:
> >>>
> >>> Hello,
> >>>
> >>> On Wednesday, November 30, 2022 3:30:38 PM EST Christos Zoulas wrote:
> >>>>> On Nov 29, 2022, at 5:37 PM, Steve Grubb <sgrubb at redhat.com> wrote:
> >>>>> I run across a case where python files get misdetected when an import
> >>>>> statement is commented out. For example:
> >>>>>
> >>>>> #!/bin/sh
> >>>>> echo DEFANGED.1
> >>>>> exit
> >>>>> #!/usr/bin/python3
> >>>>> import os
> >>>>> os.system("ls")
> >>>>>
> >>>>> file --mime-type example.py
> >>>>> example.py: text/x-script.python
> >>>>>
> >>>>> #!/usr/bin/python3
> >>>>> #import os
> >>>>> os.system("ls")
> >>>>>
> >>>>> file --mime-type example.py
> >>>>> example.py: text/x-objective-c
> >>>>>
> >>>>> It matches Objective-C with a strength of 25, where
> >>>>> #!\040/usr/bin/python
> >>>>> has a strength of 15. It would seem very plausible for someone to
> >>>>> occassionally comment out an import statement. I'm wondering why an
> >>>>> Objective-C construct would be stronger than a python shebang (which
> >>>>> should be conclusive)? Not sure which of the two to adjust.
> >>>>
> >>>> What version of file is that? I can't reproduce it.
> >>>
> >>> I can reproduce this with 5.39, 5.41, and 5.42. All of them on Fedora
> >>> 36
> >>> or rawhide. It appears to be finding the #import statement and matching
> >>> that with more weight than the shebang.
> >>>
> >>> -Steve
More information about the File
mailing list