[File] [PATCH v2] Improve python magic checks based on PEP 552
Michał Górny
mgorny at gentoo.org
Sun Jul 24 18:14:24 UTC 2022
Replace the large part of hardcoded Python magic numbers with a simpler
check based on PEP 552, implemented in Python 3.7 (magic 3392+).
According to PEP 552, the .pyc file starts with the following header
(in pseudocode):
uleshort magic_number
string "\x0d\x0a"
ulelong flags
union {
struct {
ulelong timestamp
ulelong size
}
ulequad hash
}
The magic number is monotonically increasing. Starting with Python
3.11, the range for each version is supposed to start with 2900+50n
where n is the minor number. However, I am not sure how long this
assumption is going to hold, given that Python 3.11 alone almost
exhausted its 50-number range. Also because of this, it does not seem
a good idea to keep hardcoding all of the known versions.
Instead, try to detect a "generic PEP 552 .pyc file" by looking for:
1. the fixed "\x0d\x0a" string at offset 2
2. the flag field being clear except for the two bits currently used
(Python rejects .pyc files with additional bits set)
3. the magic number using range for CPython versions (relying on 0x0d
being part of the magic number, i.e. sufficient till CPython 3.14)
and fixed values for known PyPy3 versions
Report the specific CPython version by checking against the known
version ranges. Unfortunately, I did not find a solution that does not
involve this somewhat ugly "range tree", or hardcoding the whole range.
Be more specific that the magic values in question belong to CPython.
Additionally, report the validity checking method (timestamp-
or hash-based), plus the value of check-source flag and the validity
checking data (timestamp + size or hash value).
Finally, add the magic number used by the current versions of PyPy2.7,
PyPy3.7, PyPy3.8 and PyPy3.9. In case of the two latter versions, this
requires a fix found in HG post 7.3.9 release, as the versions up to
7.3.9 used CPython's magic due to a bug.
---
magic/Magdir/python | 116 ++++++++++++++++----------------------------
1 file changed, 42 insertions(+), 74 deletions(-)
diff --git a/magic/Magdir/python b/magic/Magdir/python
index ed588859..25be8c93 100644
--- a/magic/Magdir/python
+++ b/magic/Magdir/python
@@ -86,6 +86,8 @@
!:mime application/x-bytecode.python
0 belong 0x04f30d0a python 2.7 byte-compiled
!:mime application/x-bytecode.python
+0 belong 0x0af30d0a PyPy2.7 byte-compiled
+!:mime application/x-bytecode.python
0 belong 0xb80b0d0a python 3.0 byte-compiled
!:mime application/x-bytecode.python
0 belong 0xc20b0d0a python 3.0 byte-compiled
@@ -186,80 +188,46 @@
!:mime application/x-bytecode.python
0 belong 0x3f0d0d0a python 3.7 byte-compiled
!:mime application/x-bytecode.python
-0 belong 0x400d0d0a python 3.7 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x410d0d0a python 3.7 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x420d0d0a python 3.7 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x480d0d0a python 3.8 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x490d0d0a python 3.8 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x520d0d0a python 3.8 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x530d0d0a python 3.8 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x540d0d0a python 3.8 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x550d0d0a python 3.8 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x5c0d0d0a python 3.9 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x5d0d0d0a python 3.9 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x5e0d0d0a python 3.9 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x5f0d0d0a python 3.9 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x600d0d0a python 3.9 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x610d0d0a python 3.9 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x660d0d0a python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x670d0d0a python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x680d0d0a python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x690d0d0a python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x6a0d0d0a python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x6b0d0d0a python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x6c0d0d0a python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x6d0d0d0a python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x6e0d0d0a python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x6f0d0d0a python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x7a0d0d0a python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x7b0d0d0a python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x7c0d0d0a python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x7d0d0d0a python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x7e0d0d0a python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x7f0d0d0a python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x800d0d0a python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x810d0d0a python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x820d0d0a python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x830d0d0a python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x840d0d0a python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0 belong 0x850d0d0a python 3.11 byte-compiled
-!:mime application/x-bytecode.python
+
+# magic 3392+ implements PEP 552: Deterministic pycs
+0 name pyc-pep552
+# the flag field determines how .pyc validity is checked
+>4 ulelong&1 0 timestamp-based,
+>>8 uledate x .py timestamp: %s UTC,
+>>12 ulelong x .py size: %d bytes
+>4 ulelong&1 !0 hash-based, check-source flag
+>>4 ulelong&2 0 unset,
+>>4 ulelong&2 !0 set,
+>>8 ulequad x hash: 0x%llx
+
+# uleshort magic followed by \x0d\0xa
+2 string \x0d\x0a
+# extra check: only two bits of flag field are currently used
+>4 ulelong <0x4
+# \x0d as part of magic should suffice till Python 3.14 (magic 3600)
+>>1 ubyte 0x0d Byte-compiled Python module for
+!:mime application/x-bytecode.python
+# now look at the magic number to determine the version
+>>>0 uleshort <3400 CPython 3.7,
+>>>0 default x
+>>>>0 uleshort <3420 CPython 3.8,
+>>>>0 default x
+>>>>>0 uleshort <3430 CPython 3.9,
+>>>>>0 default x
+>>>>>>0 uleshort <3450 CPython 3.10,
+>>>>>>0 default x
+>>>>>>>0 uleshort <3500 CPython 3.11,
+>>>>>>>0 default x CPython 3.12 or newer,
+>>>0 use pyc-pep552
+>>0 uleshort 240 Byte-compiled Python module for PyPy3.7,
+!:mime application/x-bytecode.python
+>>>0 use pyc-pep552
+>>0 uleshort 256 Byte-compiled Python module for PyPy3.8,
+!:mime application/x-bytecode.python
+>>>0 use pyc-pep552
+>>0 uleshort 336 Byte-compiled Python module for PyPy3.9,
+!:mime application/x-bytecode.python
+>>>0 use pyc-pep552
0 search/1/w #!\040/usr/bin/python Python script text executable
!:strength + 15
--
2.35.1
More information about the File
mailing list