[File] [PATCH v2] Improve python magic checks based on PEP 552

Michał Górny mgorny at gentoo.org
Sun Jul 24 18:14:24 UTC 2022


Replace the large part of hardcoded Python magic numbers with a simpler
check based on PEP 552, implemented in Python 3.7 (magic 3392+).

According to PEP 552, the .pyc file starts with the following header
(in pseudocode):

    uleshort    magic_number
    string      "\x0d\x0a"
    ulelong     flags
    union {
      struct {
        ulelong timestamp
        ulelong size
      }
      ulequad   hash
    }

The magic number is monotonically increasing.  Starting with Python
3.11, the range for each version is supposed to start with 2900+50n
where n is the minor number.  However, I am not sure how long this
assumption is going to hold, given that Python 3.11 alone almost
exhausted its 50-number range.  Also because of this, it does not seem
a good idea to keep hardcoding all of the known versions.

Instead, try to detect a "generic PEP 552 .pyc file" by looking for:

1. the fixed "\x0d\x0a" string at offset 2

2. the flag field being clear except for the two bits currently used
   (Python rejects .pyc files with additional bits set)

3. the magic number using range for CPython versions (relying on 0x0d
   being part of the magic number, i.e. sufficient till CPython 3.14)
   and fixed values for known PyPy3 versions

Report the specific CPython version by checking against the known
version ranges.  Unfortunately, I did not find a solution that does not
involve this somewhat ugly "range tree", or hardcoding the whole range.
Be more specific that the magic values in question belong to CPython.

Additionally, report the validity checking method (timestamp-
or hash-based), plus the value of check-source flag and the validity
checking data (timestamp + size or hash value).

Finally, add the magic number used by the current versions of PyPy2.7,
PyPy3.7, PyPy3.8 and PyPy3.9.  In case of the two latter versions, this
requires a fix found in HG post 7.3.9 release, as the versions up to
7.3.9 used CPython's magic due to a bug.
---
 magic/Magdir/python | 116 ++++++++++++++++----------------------------
 1 file changed, 42 insertions(+), 74 deletions(-)

diff --git a/magic/Magdir/python b/magic/Magdir/python
index ed588859..25be8c93 100644
--- a/magic/Magdir/python
+++ b/magic/Magdir/python
@@ -86,6 +86,8 @@
 !:mime application/x-bytecode.python
 0	belong		0x04f30d0a	python 2.7 byte-compiled
 !:mime application/x-bytecode.python
+0	belong		0x0af30d0a	PyPy2.7 byte-compiled
+!:mime application/x-bytecode.python
 0	belong		0xb80b0d0a	python 3.0 byte-compiled
 !:mime application/x-bytecode.python
 0	belong		0xc20b0d0a	python 3.0 byte-compiled
@@ -186,80 +188,46 @@
 !:mime application/x-bytecode.python
 0	belong		0x3f0d0d0a	python 3.7 byte-compiled
 !:mime application/x-bytecode.python
-0	belong		0x400d0d0a	python 3.7 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x410d0d0a	python 3.7 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x420d0d0a	python 3.7 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x480d0d0a	python 3.8 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x490d0d0a	python 3.8 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x520d0d0a	python 3.8 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x530d0d0a	python 3.8 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x540d0d0a	python 3.8 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x550d0d0a	python 3.8 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x5c0d0d0a	python 3.9 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x5d0d0d0a	python 3.9 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x5e0d0d0a	python 3.9 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x5f0d0d0a	python 3.9 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x600d0d0a	python 3.9 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x610d0d0a	python 3.9 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x660d0d0a	python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x670d0d0a	python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x680d0d0a	python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x690d0d0a	python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x6a0d0d0a	python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x6b0d0d0a	python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x6c0d0d0a	python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x6d0d0d0a	python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x6e0d0d0a	python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x6f0d0d0a	python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x7a0d0d0a	python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x7b0d0d0a	python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x7c0d0d0a	python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x7d0d0d0a	python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x7e0d0d0a	python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x7f0d0d0a	python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x800d0d0a	python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x810d0d0a	python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x820d0d0a	python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x830d0d0a	python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x840d0d0a	python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x850d0d0a	python 3.11 byte-compiled
-!:mime application/x-bytecode.python
+
+# magic 3392+ implements PEP 552: Deterministic pycs
+0	name		pyc-pep552
+# the flag field determines how .pyc validity is checked
+>4	ulelong&1	0		timestamp-based,
+>>8	uledate		x		.py timestamp: %s UTC,
+>>12	ulelong		x		.py size: %d bytes
+>4	ulelong&1	!0		hash-based, check-source flag
+>>4	ulelong&2	0		unset,
+>>4	ulelong&2	!0		set,
+>>8	ulequad		x		hash: 0x%llx
+
+# uleshort magic followed by \x0d\0xa
+2		string		\x0d\x0a
+# extra check: only two bits of flag field are currently used
+>4		ulelong		<0x4
+# \x0d as part of magic should suffice till Python 3.14 (magic 3600)
+>>1		ubyte		0x0d		Byte-compiled Python module for
+!:mime application/x-bytecode.python
+# now look at the magic number to determine the version
+>>>0		uleshort	<3400		CPython 3.7,
+>>>0		default		x
+>>>>0		uleshort	<3420		CPython 3.8,
+>>>>0		default		x
+>>>>>0		uleshort	<3430		CPython 3.9,
+>>>>>0		default		x
+>>>>>>0		uleshort	<3450		CPython 3.10,
+>>>>>>0		default		x
+>>>>>>>0	uleshort	<3500		CPython 3.11,
+>>>>>>>0	default		x		CPython 3.12 or newer,
+>>>0		use		pyc-pep552
+>>0		uleshort	240		Byte-compiled Python module for PyPy3.7,
+!:mime application/x-bytecode.python
+>>>0		use		pyc-pep552
+>>0		uleshort	256		Byte-compiled Python module for PyPy3.8,
+!:mime application/x-bytecode.python
+>>>0		use		pyc-pep552
+>>0		uleshort	336		Byte-compiled Python module for PyPy3.9,
+!:mime application/x-bytecode.python
+>>>0		use		pyc-pep552
 
 0	search/1/w	#!\040/usr/bin/python	Python script text executable
 !:strength + 15
-- 
2.35.1



More information about the File mailing list