[File] Python file misdetection
Christos Zoulas
christos at zoulas.com
Thu Dec 1 19:02:07 UTC 2022
Don't know, for me:
[DING!] 48>./file -k --mime-type -m ../magic/magic.mgc objc.py
objc.py: text/x-script.python
[2:00pm] 49>./file -k -m ../magic/magic.mgc objc.py
objc.py: Python script text executable\012- Objective-C source text\012- a /usr/bin/python3 script, ASCII text executable
[2:00pm] 50>cat objc.py
#!/usr/bin/python3
#import os
os.system("ls")
christos
> On Dec 1, 2022, at 1:59 PM, Steve Grubb <sgrubb at redhat.com> wrote:
>
> On Thursday, December 1, 2022 1:50:08 PM EST Christos Zoulas wrote:
>> It is probably using the system libraries, you probably want
>> LD_LIBRARY_PATH= But the shell script that is in src should work just
>> fine:
>>
>> src/file -m magic/magic.mgc --mime-type ~/test.py
>
> No change. Is there any other data that I can gather to help pinpoint the
> issue?
>
> -Steve
>
>>> On Dec 1, 2022, at 1:39 PM, Steve Grubb <sgrubb at redhat.com> wrote:
>>>
>>> On Thursday, December 1, 2022 9:44:39 AM EST Christos Zoulas wrote:
>>>> Can you please try 5.43 or HEAD?
>>>
>>> I did but it also misdetects the file type. I am trying to see if I can
>>> find a reason for the misdetection. When I test, I am using:
>>>
>>> LD_LIBRARY=src/.libs/ src/.libs/file --mime-type -m magic/magic.mgc
>>> ~/test.py
>>>
>>> To make sure it uses the repo's copies and not the system's copies. What
>>> I'm seeing with -k -l is:
>>>
>>> Strength = 63 at 96: Objective-C source text [text/x-objective-c]
>>> Strength = 63 at 232: Python script text executable [text/x-script.python]
>>>
>>> When I look at the debug output, I see only 2 found statements:
>>>
>>> 40: > 0 search/8192,!p,""]
>>> search: [#!/usr/bin/python3\n\n#import os\n\nos.system("ls")\n\n] for [p]
>>> found
>>> 0 != 0 = 0
>>> bb=[0x1459da0,49,0], 0 [b=0x1459da0,49,0], [o=0, c=0]
>>> mget(type=20, flag=0x40, offset=0, o=0, nbytes=49, il=0, nc=0)
>>>
>>> 96: > 0 search/8192,=#import,""]
>>> search: [#!/usr/bin/python3\n\n#import os\n\nos.system("ls")\n\n] for
>>> [#import] found
>>> 0 == 0 = 1
>>> bb=[0x1459da0,49,0], 0 [b=0x1459da0,49,0], [o=0, c=1]
>>>
>>> I don't see any matches for python. Is there any other data that I could
>>> gather to help figure out what's happening?
>>>
>>> -Steve
>>>
>>>>> On Nov 30, 2022, at 4:40 PM, Steve Grubb <sgrubb at redhat.com> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> On Wednesday, November 30, 2022 3:30:38 PM EST Christos Zoulas wrote:
>>>>>>> On Nov 29, 2022, at 5:37 PM, Steve Grubb <sgrubb at redhat.com> wrote:
>>>>>>> I run across a case where python files get misdetected when an import
>>>>>>> statement is commented out. For example:
>>>>>>>
>>>>>>> #!/bin/sh
>>>>>>> echo DEFANGED.1
>>>>>>> exit
>>>>>>> #!/usr/bin/python3
>>>>>>> import os
>>>>>>> os.system("ls")
>>>>>>>
>>>>>>> file --mime-type example.py
>>>>>>> example.py: text/x-script.python
>>>>>>>
>>>>>>> #!/usr/bin/python3
>>>>>>> #import os
>>>>>>> os.system("ls")
>>>>>>>
>>>>>>> file --mime-type example.py
>>>>>>> example.py: text/x-objective-c
>>>>>>>
>>>>>>> It matches Objective-C with a strength of 25, where
>>>>>>> #!\040/usr/bin/python
>>>>>>> has a strength of 15. It would seem very plausible for someone to
>>>>>>> occassionally comment out an import statement. I'm wondering why an
>>>>>>> Objective-C construct would be stronger than a python shebang (which
>>>>>>> should be conclusive)? Not sure which of the two to adjust.
>>>>>>
>>>>>> What version of file is that? I can't reproduce it.
>>>>>
>>>>> I can reproduce this with 5.39, 5.41, and 5.42. All of them on Fedora
>>>>> 36
>>>>> or rawhide. It appears to be finding the #import statement and matching
>>>>> that with more weight than the shebang.
>>>>>
>>>>> -Steve
>
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20221201/21574083/attachment.asc>
More information about the File
mailing list