[File] Python file misdetection

Christos Zoulas christos at zoulas.com
Thu Dec 1 18:50:08 UTC 2022


It is probably using the system libraries, you probably want LD_LIBRARY_PATH=
But the shell script that is in src should work just fine:

src/file -m magic/magic.mgc --mime-type ~/test.py

christos

> On Dec 1, 2022, at 1:39 PM, Steve Grubb <sgrubb at redhat.com> wrote:
> 
> On Thursday, December 1, 2022 9:44:39 AM EST Christos Zoulas wrote:
>> Can you please try 5.43 or HEAD?
> 
> I did but it also misdetects the file type. I am trying to see if I can find a
> reason for the misdetection. When I test, I am using:
> 
> LD_LIBRARY=src/.libs/ src/.libs/file --mime-type -m magic/magic.mgc ~/test.py
> 
> To make sure it uses the repo's copies and not the system's copies. What I'm
> seeing with -k -l is:
> 
> Strength =  63 at 96: Objective-C source text [text/x-objective-c]
> Strength =  63 at 232: Python script text executable [text/x-script.python]
> 
> When I look at the debug output, I see only 2 found statements:
> 
> 40: > 0 search/8192,!p,""]
> search: [#!/usr/bin/python3\n\n#import os\n\nos.system("ls")\n\n] for [p]
> found
> 0 != 0 = 0
> bb=[0x1459da0,49,0], 0 [b=0x1459da0,49,0], [o=0, c=0]
> mget(type=20, flag=0x40, offset=0, o=0, nbytes=49, il=0, nc=0)
> 
> 96: > 0 search/8192,=#import,""]
> search: [#!/usr/bin/python3\n\n#import os\n\nos.system("ls")\n\n] for
> [#import] found
> 0 == 0 = 1
> bb=[0x1459da0,49,0], 0 [b=0x1459da0,49,0], [o=0, c=1]
> 
> I don't see any matches for python. Is there any other data that I could
> gather to help figure out what's happening?
> 
> -Steve
> 
>>> On Nov 30, 2022, at 4:40 PM, Steve Grubb <sgrubb at redhat.com> wrote:
>>> 
>>> Hello,
>>> 
>>> On Wednesday, November 30, 2022 3:30:38 PM EST Christos Zoulas wrote:
>>>>> On Nov 29, 2022, at 5:37 PM, Steve Grubb <sgrubb at redhat.com> wrote:
>>>>> I run across a case where python files get misdetected when an import
>>>>> statement is commented out. For example:
>>>>> 
>>>>> #!/bin/sh
>>>>> echo DEFANGED.1
>>>>> exit
>>>>> #!/usr/bin/python3
>>>>> import os
>>>>> os.system("ls")
>>>>> 
>>>>> file --mime-type example.py
>>>>> example.py: text/x-script.python
>>>>> 
>>>>> #!/usr/bin/python3
>>>>> #import os
>>>>> os.system("ls")
>>>>> 
>>>>> file --mime-type example.py
>>>>> example.py: text/x-objective-c
>>>>> 
>>>>> It matches Objective-C with a strength of 25, where
>>>>> #!\040/usr/bin/python
>>>>> has a strength of 15. It would seem very plausible for someone to
>>>>> occassionally comment out an import statement. I'm wondering why an
>>>>> Objective-C construct would be stronger than a python shebang (which
>>>>> should be conclusive)? Not sure which of the two to adjust.
>>>> 
>>>> What version of file is that? I can't reproduce it.
>>> 
>>> I can reproduce this with 5.39, 5.41, and 5.42. All of them on Fedora 36
>>> or rawhide. It appears to be finding the #import statement and matching
>>> that with more weight than the shebang.
>>> 
>>> -Steve
> 
> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20221201/bef85689/attachment.asc>


More information about the File mailing list