b326edf68c
This commit removes all parts of code associated with the existing MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE optimisation option, including the -mcache-lookup-bc option to mpy-cross. This feature originally provided a significant performance boost for Unix, but wasn't able to be enabled for MCU targets (due to frozen bytecode), and added significant extra complexity to generating and distributing .mpy files. The equivalent performance gain is now provided by the combination of MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE (which has been enabled on the unix port in the previous commit). It's hard to provide precise performance numbers, but tests have been run on a wide variety of architectures (x86-64, ARM Cortex, Aarch64, RISC-V, xtensa) and they all generally agree on the qualitative improvements seen by the combination of MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE. For example, on a "quiet" Linux x64 environment (i3-5010U @ 2.10GHz) the change from CACHE_MAP_LOOKUP_IN_BYTECODE, to LOAD_ATTR_FAST_PATH combined with MAP_LOOKUP_CACHE is: diff of scores (higher is better) N=2000 M=2000 bccache -> attrmapcache diff diff% (error%) bm_chaos.py 13742.56 -> 13905.67 : +163.11 = +1.187% (+/-3.75%) bm_fannkuch.py 60.13 -> 61.34 : +1.21 = +2.012% (+/-2.11%) bm_fft.py 113083.20 -> 114793.68 : +1710.48 = +1.513% (+/-1.57%) bm_float.py 256552.80 -> 243908.29 : -12644.51 = -4.929% (+/-1.90%) bm_hexiom.py 521.93 -> 625.41 : +103.48 = +19.826% (+/-0.40%) bm_nqueens.py 197544.25 -> 217713.12 : +20168.87 = +10.210% (+/-3.01%) bm_pidigits.py 8072.98 -> 8198.75 : +125.77 = +1.558% (+/-3.22%) misc_aes.py 17283.45 -> 16480.52 : -802.93 = -4.646% (+/-0.82%) misc_mandel.py 99083.99 -> 128939.84 : +29855.85 = +30.132% (+/-5.88%) misc_pystone.py 83860.10 -> 82592.56 : -1267.54 = -1.511% (+/-2.27%) misc_raytrace.py 21490.40 -> 22227.23 : +736.83 = +3.429% (+/-1.88%) This shows that the new optimisations are at least as good as the existing inline-bytecode-caching, and are sometimes much better (because the new ones apply caching to a wider variety of map lookups). The new optimisations can also benefit code generated by the native emitter, because they apply to the runtime rather than the generated code. The improvement for the native emitter when LOAD_ATTR_FAST_PATH and MAP_LOOKUP_CACHE are enabled is (same Linux environment as above): diff of scores (higher is better) N=2000 M=2000 native -> nat-attrmapcache diff diff% (error%) bm_chaos.py 14130.62 -> 15464.68 : +1334.06 = +9.441% (+/-7.11%) bm_fannkuch.py 74.96 -> 76.16 : +1.20 = +1.601% (+/-1.80%) bm_fft.py 166682.99 -> 168221.86 : +1538.87 = +0.923% (+/-4.20%) bm_float.py 233415.23 -> 265524.90 : +32109.67 = +13.756% (+/-2.57%) bm_hexiom.py 628.59 -> 734.17 : +105.58 = +16.796% (+/-1.39%) bm_nqueens.py 225418.44 -> 232926.45 : +7508.01 = +3.331% (+/-3.10%) bm_pidigits.py 6322.00 -> 6379.52 : +57.52 = +0.910% (+/-5.62%) misc_aes.py 20670.10 -> 27223.18 : +6553.08 = +31.703% (+/-1.56%) misc_mandel.py 138221.11 -> 152014.01 : +13792.90 = +9.979% (+/-2.46%) misc_pystone.py 85032.14 -> 105681.44 : +20649.30 = +24.284% (+/-2.25%) misc_raytrace.py 19800.01 -> 23350.73 : +3550.72 = +17.933% (+/-2.79%) In summary, compared to MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE, the new MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE options: - are simpler; - take less code size; - are faster (generally); - work with code generated by the native emitter; - can be used on embedded targets with a small and constant RAM overhead; - allow the same .mpy bytecode to run on all targets. See #7680 for further discussion. And see also #7653 for a discussion about simplifying mpy-cross options. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
177 lines
6.8 KiB
ReStructuredText
177 lines
6.8 KiB
ReStructuredText
.. _mpy_files:
|
|
|
|
MicroPython .mpy files
|
|
======================
|
|
|
|
MicroPython defines the concept of an .mpy file which is a binary container
|
|
file format that holds precompiled code, and which can be imported like a
|
|
normal .py module. The file ``foo.mpy`` can be imported via ``import foo``,
|
|
as long as ``foo.mpy`` can be found in the usual way by the import machinery.
|
|
Usually, each directory listed in ``sys.path`` is searched in order. When
|
|
searching a particular directory ``foo.py`` is looked for first and if that
|
|
is not found then ``foo.mpy`` is looked for, then the search continues in the
|
|
next directory if neither is found. As such, ``foo.py`` will take precedence
|
|
over ``foo.mpy``.
|
|
|
|
These .mpy files can contain bytecode which is usually generated from Python
|
|
source files (.py files) via the ``mpy-cross`` program. For some architectures
|
|
an .mpy file can also contain native machine code, which can be generated in
|
|
a variety of ways, most notably from C source code.
|
|
|
|
Versioning and compatibility of .mpy files
|
|
------------------------------------------
|
|
|
|
A given .mpy file may or may not be compatible with a given MicroPython system.
|
|
Compatibility is based on the following:
|
|
|
|
* Version of the .mpy file: the version of the file must match the version
|
|
supported by the system loading it.
|
|
|
|
* Bytecode features used in the .mpy file: there are two bytecode features
|
|
which must match between the file and the system: unicode support and
|
|
inline caching of map lookups in the bytecode.
|
|
|
|
* Small integer bits: the .mpy file will require a minimum number of bits in
|
|
a small integer and the system loading it must support at least this many
|
|
bits.
|
|
|
|
* Qstr compression window size: the .mpy file will require a minimum window
|
|
size for qstr decompression and the system loading it must have a window
|
|
greater or equal to this size.
|
|
|
|
* Native architecture: if the .mpy file contains native machine code then
|
|
it will specify the architecture of that machine code and the system
|
|
loading it must support execution of that architecture's code.
|
|
|
|
If a MicroPython system supports importing .mpy files then the
|
|
``sys.implementation.mpy`` field will exist and return an integer which
|
|
encodes the version (lower 8 bits), features and native architecture.
|
|
|
|
Trying to import an .mpy file that fails one of the first four tests will
|
|
raise ``ValueError('incompatible .mpy file')``. Trying to import an .mpy
|
|
file that fails the native architecture test (if it contains native machine
|
|
code) will raise ``ValueError('incompatible .mpy arch')``.
|
|
|
|
If importing an .mpy file fails then try the following:
|
|
|
|
* Determine the .mpy version and flags supported by your MicroPython system
|
|
by executing::
|
|
|
|
import sys
|
|
sys_mpy = sys.implementation.mpy
|
|
arch = [None, 'x86', 'x64',
|
|
'armv6', 'armv6m', 'armv7m', 'armv7em', 'armv7emsp', 'armv7emdp',
|
|
'xtensa', 'xtensawin'][sys_mpy >> 10]
|
|
print('mpy version:', sys_mpy & 0xff)
|
|
print('mpy flags:', end='')
|
|
if arch:
|
|
print(' -march=' + arch, end='')
|
|
if not sys_mpy & 0x200:
|
|
print(' -mno-unicode', end='')
|
|
print()
|
|
|
|
* Check the validity of the .mpy file by inspecting the first two bytes of
|
|
the file. The first byte should be an uppercase 'M' and the second byte
|
|
will be the version number, which should match the system version from above.
|
|
If it doesn't match then rebuild the .mpy file.
|
|
|
|
* Check if the system .mpy version matches the version emitted by ``mpy-cross``
|
|
that was used to build the .mpy file, found by ``mpy-cross --version``.
|
|
If it doesn't match then recompile ``mpy-cross`` from the Git repository
|
|
checked out at the tag (or hash) reported by ``mpy-cross --version``.
|
|
|
|
* Make sure you are using the correct ``mpy-cross`` flags, found by the code
|
|
above, or by inspecting the ``MPY_CROSS_FLAGS`` Makefile variable for the
|
|
port that you are using.
|
|
|
|
The following table shows the correspondence between MicroPython release
|
|
and .mpy version.
|
|
|
|
=================== ============
|
|
MicroPython release .mpy version
|
|
=================== ============
|
|
v1.12 and up 5
|
|
v1.11 4
|
|
v1.9.3 - v1.10 3
|
|
v1.9 - v1.9.2 2
|
|
v1.5.1 - v1.8.7 0
|
|
=================== ============
|
|
|
|
For completeness, the next table shows the Git commit of the main
|
|
MicroPython repository at which the .mpy version was changed.
|
|
|
|
=================== ========================================
|
|
.mpy version change Git commit
|
|
=================== ========================================
|
|
4 to 5 5716c5cf65e9b2cb46c2906f40302401bdd27517
|
|
3 to 4 9a5f92ea72754c01cc03e5efcdfe94021120531e
|
|
2 to 3 ff93fd4f50321c6190e1659b19e64fef3045a484
|
|
1 to 2 dd11af209d226b7d18d5148b239662e30ed60bad
|
|
0 to 1 6a11048af1d01c78bdacddadd1b72dc7ba7c6478
|
|
initial version 0 d8c834c95d506db979ec871417de90b7951edc30
|
|
=================== ========================================
|
|
|
|
Binary encoding of .mpy files
|
|
-----------------------------
|
|
|
|
MicroPython .mpy files are a binary container format with code objects
|
|
stored internally in a nested hierarchy. To keep files small while still
|
|
providing a large range of possible values it uses the concept of a
|
|
variably-encoded-unsigned-integer (vuint) in many places. Similar to utf-8
|
|
encoding, this encoding stores 7 bits per byte with the 8th bit (MSB) set
|
|
if one or more bytes follow. The bits of the unsigned integer are stored
|
|
in the vuint in LSB form.
|
|
|
|
The top-level of an .mpy file consists of two parts:
|
|
|
|
* The header.
|
|
|
|
* The raw-code for the outer scope of the module.
|
|
This outer scope is executed when the .mpy file is imported.
|
|
|
|
The header
|
|
~~~~~~~~~~
|
|
|
|
The .mpy header is:
|
|
|
|
====== ================================
|
|
size field
|
|
====== ================================
|
|
byte value 0x4d (ASCII 'M')
|
|
byte .mpy version number
|
|
byte feature flags
|
|
byte number of bits in a small int
|
|
vuint size of qstr window
|
|
====== ================================
|
|
|
|
Raw code elements
|
|
~~~~~~~~~~~~~~~~~
|
|
|
|
A raw-code element contains code, either bytecode or native machine code. Its
|
|
contents are:
|
|
|
|
====== ================================
|
|
size field
|
|
====== ================================
|
|
vuint type and size
|
|
... code (bytecode or machine code)
|
|
vuint number of constant objects
|
|
vuint number of sub-raw-code elements
|
|
... constant objects
|
|
... sub-raw-code elements
|
|
====== ================================
|
|
|
|
The first vuint in a raw-code element encodes the type of code stored in this
|
|
element (the two least-significant bits), and the decompressed length of the code
|
|
(the amount of RAM to allocate for it).
|
|
|
|
Following the vuint comes the code itself. In the case of bytecode it also contains
|
|
compressed qstr values.
|
|
|
|
Following the code comes a vuint counting the number of constant objects, and
|
|
another vuint counting the number of sub-raw-code elements.
|
|
|
|
The constant objects are then stored next.
|
|
|
|
Finally any sub-raw-code elements are stored, recursively.
|