[File] [PATCH] Improve python magic checks based on PEP 552

Michał Górny mgorny at gentoo.org
Tue Jul 19 15:14:06 UTC 2022


Replace the large part of hardcoded Python magic numbers with a simpler
check based on PEP 552, implemented in Python 3.7 (magic 3392+).

According to PEP 552, the .pyc file starts with the following header
(in pseudocode):

    uleshort    magic_number
    string      "\x0d\x0a"
    ulelong     flags
    union {
      struct {
        ulelong timestamp
        ulelong size
      }
      ulequad   hash
    }

The magic number is monotonically increasing.  Starting with Python
3.11, the range for each version is supposed to start with 2900+50n
where n is the minor number.  However, I am not sure how long this
assumption is going to hold, given that Python 3.11 alone almost
exhausted its 50-number range.  Also because of this, it does not seem
a good idea to keep hardcoding all of the known versions.

Instead, try to detect a "generic PEP 552 .pyc file" by looking for:

1. the "\x0d\x0d\x0a" string at offset 1 -- this covers the fixed part
   of the header plus half of the magic number that should suffice till
   magic 3600 (Python 3.14)

2. the flag field being clear except for the two bits currently used
   (Python rejects .pyc files with additional bits set)

Report the Python version by checking against the known version ranges.
Unfortunately, I did not find a solution that does not involve this
somewhat ugly "range tree", or hardcoding the whole range.  Be more
specific that the magic values in question belong to CPython.

Additionally, report the validity checking method (timestamp-
or hash-based), plus the value of check-source flag and the validity
checking data (timestamp + size or hash value).

Finally, add the magic number used by the current version of PyPy2.7.
I am planning to also include support for PyPy3.9 in the future.
However, the current versions wrongly use CPython magic numbers
due to an implementation bug:
https://foss.heptapod.net/pypy/pypy/-/issues/3783
---
 magic/Magdir/python | 103 +++++++++++++-------------------------------
 1 file changed, 29 insertions(+), 74 deletions(-)

diff --git a/magic/Magdir/python b/magic/Magdir/python
index ed588859..5b1e5f1b 100644
--- a/magic/Magdir/python
+++ b/magic/Magdir/python
@@ -86,6 +86,8 @@
 !:mime application/x-bytecode.python
 0	belong		0x04f30d0a	python 2.7 byte-compiled
 !:mime application/x-bytecode.python
+0	belong		0x0af30d0a	PyPy2.7 byte-compiled
+!:mime application/x-bytecode.python
 0	belong		0xb80b0d0a	python 3.0 byte-compiled
 !:mime application/x-bytecode.python
 0	belong		0xc20b0d0a	python 3.0 byte-compiled
@@ -186,80 +188,33 @@
 !:mime application/x-bytecode.python
 0	belong		0x3f0d0d0a	python 3.7 byte-compiled
 !:mime application/x-bytecode.python
-0	belong		0x400d0d0a	python 3.7 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x410d0d0a	python 3.7 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x420d0d0a	python 3.7 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x480d0d0a	python 3.8 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x490d0d0a	python 3.8 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x520d0d0a	python 3.8 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x530d0d0a	python 3.8 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x540d0d0a	python 3.8 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x550d0d0a	python 3.8 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x5c0d0d0a	python 3.9 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x5d0d0d0a	python 3.9 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x5e0d0d0a	python 3.9 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x5f0d0d0a	python 3.9 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x600d0d0a	python 3.9 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x610d0d0a	python 3.9 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x660d0d0a	python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x670d0d0a	python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x680d0d0a	python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x690d0d0a	python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x6a0d0d0a	python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x6b0d0d0a	python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x6c0d0d0a	python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x6d0d0d0a	python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x6e0d0d0a	python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x6f0d0d0a	python 3.10 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x7a0d0d0a	python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x7b0d0d0a	python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x7c0d0d0a	python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x7d0d0d0a	python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x7e0d0d0a	python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x7f0d0d0a	python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x800d0d0a	python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x810d0d0a	python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x820d0d0a	python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x830d0d0a	python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x840d0d0a	python 3.11 byte-compiled
-!:mime application/x-bytecode.python
-0	belong		0x850d0d0a	python 3.11 byte-compiled
-!:mime application/x-bytecode.python
+
+# magic 3392+ implements PEP 552: Deterministic pycs
+# uleshort magic followed by \x0d\0xa
+# \x0d as part of magic should suffice till Python 3.14 (magic 3600)
+1	string		\x0d\x0d\x0a
+# extra check: only two bits of flag field are currently used
+>4	ulelong		<0x4		Byte-compiled Python module for
+!:mime application/x-bytecode.python
+# now look at the magic number to determine the version
+>>0	uleshort	<3400		CPython 3.7,
+>>0	default		x
+>>>0	uleshort	<3420		CPython 3.8,
+>>>0	default		x
+>>>>0	uleshort	<3430		CPython 3.9,
+>>>>0	default		x
+>>>>>0	uleshort	<3450		CPython 3.10,
+>>>>>0	default		x
+>>>>>>0	uleshort	<3500		CPython 3.11,
+>>>>>>0	default		x		CPython 3.12 or newer,
+# the flag field determines how .pyc validity is checked
+>>4	ulelong&1	0		timestamp-based,
+>>>8	uledate		x		.py timestamp: %s UTC,
+>>>12	ulelong		x		.py size: %d bytes
+>>4	ulelong&1	!0		hash-based, check-source flag
+>>>4	ulelong&2	0		unset,
+>>>4	ulelong&2	!0		set,
+>>>8	ulequad		x		hash: 0x%llx
 
 0	search/1/w	#!\040/usr/bin/python	Python script text executable
 !:strength + 15
-- 
2.35.1



More information about the File mailing list