Commit Graph

25 Commits

Author SHA1 Message Date
Damien George f2040bfc7e py: Rework bytecode and .mpy file format to be mostly static data.
Background: .mpy files are precompiled .py files, built using mpy-cross,
that contain compiled bytecode functions (and can also contain machine
code). The benefit of using an .mpy file over a .py file is that they are
faster to import and take less memory when importing.  They are also
smaller on disk.

But the real benefit of .mpy files comes when they are frozen into the
firmware.  This is done by loading the .mpy file during compilation of the
firmware and turning it into a set of big C data structures (the job of
mpy-tool.py), which are then compiled and downloaded into the ROM of a
device.  These C data structures can be executed in-place, ie directly from
ROM.  This makes importing even faster because there is very little to do,
and also means such frozen modules take up much less RAM (because their
bytecode stays in ROM).

The downside of frozen code is that it requires recompiling and reflashing
the entire firmware.  This can be a big barrier to entry, slows down
development time, and makes it harder to do OTA updates of frozen code
(because the whole firmware must be updated).

This commit attempts to solve this problem by providing a solution that
sits between loading .mpy files into RAM and freezing them into the
firmware.  The .mpy file format has been reworked so that it consists of
data and bytecode which is mostly static and ready to run in-place.  If
these new .mpy files are located in flash/ROM which is memory addressable,
the .mpy file can be executed (mostly) in-place.

With this approach there is still a small amount of unpacking and linking
of the .mpy file that needs to be done when it's imported, but it's still
much better than loading an .mpy from disk into RAM (although not as good
as freezing .mpy files into the firmware).

The main trick to make static .mpy files is to adjust the bytecode so any
qstrs that it references now go through a lookup table to convert from
local qstr number in the module to global qstr number in the firmware.
That means the bytecode does not need linking/rewriting of qstrs when it's
loaded.  Instead only a small qstr table needs to be built (and put in RAM)
at import time.  This means the bytecode itself is static/constant and can
be used directly if it's in addressable memory.  Also the qstr string data
in the .mpy file, and some constant object data, can be used directly.
Note that the qstr table is global to the module (ie not per function).

In more detail, in the VM what used to be (schematically):

    qst = DECODE_QSTR_VALUE;

is now (schematically):

    idx = DECODE_QSTR_INDEX;
    qst = qstr_table[idx];

That allows the bytecode to be fixed at compile time and not need
relinking/rewriting of the qstr values.  Only qstr_table needs to be linked
when the .mpy is loaded.

Incidentally, this helps to reduce the size of bytecode because what used
to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices.
If the module uses the same qstr more than two times then the bytecode is
smaller than before.

The following changes are measured for this commit compared to the
previous (the baseline):
- average 7%-9% reduction in size of .mpy files
- frozen code size is reduced by about 5%-7%
- importing .py files uses about 5% less RAM in total
- importing .mpy files uses about 4% less RAM in total
- importing .py and .mpy files takes about the same time as before

The qstr indirection in the bytecode has only a small impact on VM
performance.  For stm32 on PYBv1.0 the performance change of this commit
is:

diff of scores (higher is better)
N=100 M=100             baseline -> this-commit  diff      diff% (error%)
bm_chaos.py               371.07 ->  357.39 :  -13.68 =  -3.687% (+/-0.02%)
bm_fannkuch.py             78.72 ->   77.49 :   -1.23 =  -1.563% (+/-0.01%)
bm_fft.py                2591.73 -> 2539.28 :  -52.45 =  -2.024% (+/-0.00%)
bm_float.py              6034.93 -> 5908.30 : -126.63 =  -2.098% (+/-0.01%)
bm_hexiom.py               48.96 ->   47.93 :   -1.03 =  -2.104% (+/-0.00%)
bm_nqueens.py            4510.63 -> 4459.94 :  -50.69 =  -1.124% (+/-0.00%)
bm_pidigits.py            650.28 ->  644.96 :   -5.32 =  -0.818% (+/-0.23%)
core_import_mpy_multi.py  564.77 ->  581.49 :  +16.72 =  +2.960% (+/-0.01%)
core_import_mpy_single.py  68.67 ->   67.16 :   -1.51 =  -2.199% (+/-0.01%)
core_qstr.py               64.16 ->   64.12 :   -0.04 =  -0.062% (+/-0.00%)
core_yield_from.py        362.58 ->  354.50 :   -8.08 =  -2.228% (+/-0.00%)
misc_aes.py               429.69 ->  405.59 :  -24.10 =  -5.609% (+/-0.01%)
misc_mandel.py           3485.13 -> 3416.51 :  -68.62 =  -1.969% (+/-0.00%)
misc_pystone.py          2496.53 -> 2405.56 :  -90.97 =  -3.644% (+/-0.01%)
misc_raytrace.py          381.47 ->  374.01 :   -7.46 =  -1.956% (+/-0.01%)
viper_call0.py            576.73 ->  572.49 :   -4.24 =  -0.735% (+/-0.04%)
viper_call1a.py           550.37 ->  546.21 :   -4.16 =  -0.756% (+/-0.09%)
viper_call1b.py           438.23 ->  435.68 :   -2.55 =  -0.582% (+/-0.06%)
viper_call1c.py           442.84 ->  440.04 :   -2.80 =  -0.632% (+/-0.08%)
viper_call2a.py           536.31 ->  532.35 :   -3.96 =  -0.738% (+/-0.06%)
viper_call2b.py           382.34 ->  377.07 :   -5.27 =  -1.378% (+/-0.03%)

And for unix on x64:

diff of scores (higher is better)
N=2000 M=2000        baseline -> this-commit     diff      diff% (error%)
bm_chaos.py          13594.20 ->  13073.84 :  -520.36 =  -3.828% (+/-5.44%)
bm_fannkuch.py          60.63 ->     59.58 :    -1.05 =  -1.732% (+/-3.01%)
bm_fft.py           112009.15 -> 111603.32 :  -405.83 =  -0.362% (+/-4.03%)
bm_float.py         246202.55 -> 247923.81 : +1721.26 =  +0.699% (+/-2.79%)
bm_hexiom.py           615.65 ->    617.21 :    +1.56 =  +0.253% (+/-1.64%)
bm_nqueens.py       215807.95 -> 215600.96 :  -206.99 =  -0.096% (+/-3.52%)
bm_pidigits.py        8246.74 ->   8422.82 :  +176.08 =  +2.135% (+/-3.64%)
misc_aes.py          16133.00 ->  16452.74 :  +319.74 =  +1.982% (+/-1.50%)
misc_mandel.py      128146.69 -> 130796.43 : +2649.74 =  +2.068% (+/-3.18%)
misc_pystone.py      83811.49 ->  83124.85 :  -686.64 =  -0.819% (+/-1.03%)
misc_raytrace.py     21688.02 ->  21385.10 :  -302.92 =  -1.397% (+/-3.20%)

The code size change is (firmware with a lot of frozen code benefits the
most):

       bare-arm:  +396 +0.697%
    minimal x86: +1595 +0.979% [incl +32(data)]
       unix x64: +2408 +0.470% [incl +800(data)]
    unix nanbox: +1396 +0.309% [incl -96(data)]
          stm32: -1256 -0.318% PYBV10
         cc3200:  +288 +0.157%
        esp8266:  -260 -0.037% GENERIC
          esp32:  -216 -0.014% GENERIC[incl -1072(data)]
            nrf:  +116 +0.067% pca10040
            rp2:  -664 -0.135% PICO
           samd:  +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS

As part of this change the .mpy file format version is bumped to version 6.
And mpy-tool.py has been improved to provide a good visualisation of the
contents of .mpy files.

In summary: this commit changes the bytecode to use qstr indirection, and
reworks the .mpy file format to be simpler and allow .mpy files to be
executed in-place.  Performance is not impacted too much.  Eventually it
will be possible to store such .mpy files in a linear, read-only, memory-
mappable filesystem so they can be executed from flash/ROM.  This will
essentially be able to replace frozen code for most applications.

Signed-off-by: Damien George <damien@micropython.org>
2022-02-24 18:08:43 +11:00
Damien George de43b500bd py/runtime: Allow initialising sys.path/argv with defaults.
If MICROPY_PY_SYS_PATH_ARGV_DEFAULTS is enabled (which it is by default)
then sys.path and sys.argv will be initialised and populated with default
values.  This keeps all bare-metal ports aligned.

Signed-off-by: Damien George <damien@micropython.org>
2021-12-18 00:08:07 +11:00
Jim Mussared b326edf68c all: Remove MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE.
This commit removes all parts of code associated with the existing
MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE optimisation option, including the
-mcache-lookup-bc option to mpy-cross.

This feature originally provided a significant performance boost for Unix,
but wasn't able to be enabled for MCU targets (due to frozen bytecode), and
added significant extra complexity to generating and distributing .mpy
files.

The equivalent performance gain is now provided by the combination of
MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE (which has
been enabled on the unix port in the previous commit).

It's hard to provide precise performance numbers, but tests have been run
on a wide variety of architectures (x86-64, ARM Cortex, Aarch64, RISC-V,
xtensa) and they all generally agree on the qualitative improvements seen
by the combination of MICROPY_OPT_LOAD_ATTR_FAST_PATH and
MICROPY_OPT_MAP_LOOKUP_CACHE.

For example, on a "quiet" Linux x64 environment (i3-5010U @ 2.10GHz) the
change from CACHE_MAP_LOOKUP_IN_BYTECODE, to LOAD_ATTR_FAST_PATH combined
with MAP_LOOKUP_CACHE is:

diff of scores (higher is better)
N=2000 M=2000       bccache -> attrmapcache      diff      diff% (error%)
bm_chaos.py        13742.56 ->   13905.67 :   +163.11 =  +1.187% (+/-3.75%)
bm_fannkuch.py        60.13 ->      61.34 :     +1.21 =  +2.012% (+/-2.11%)
bm_fft.py         113083.20 ->  114793.68 :  +1710.48 =  +1.513% (+/-1.57%)
bm_float.py       256552.80 ->  243908.29 : -12644.51 =  -4.929% (+/-1.90%)
bm_hexiom.py         521.93 ->     625.41 :   +103.48 = +19.826% (+/-0.40%)
bm_nqueens.py     197544.25 ->  217713.12 : +20168.87 = +10.210% (+/-3.01%)
bm_pidigits.py      8072.98 ->    8198.75 :   +125.77 =  +1.558% (+/-3.22%)
misc_aes.py        17283.45 ->   16480.52 :   -802.93 =  -4.646% (+/-0.82%)
misc_mandel.py     99083.99 ->  128939.84 : +29855.85 = +30.132% (+/-5.88%)
misc_pystone.py    83860.10 ->   82592.56 :  -1267.54 =  -1.511% (+/-2.27%)
misc_raytrace.py   21490.40 ->   22227.23 :   +736.83 =  +3.429% (+/-1.88%)

This shows that the new optimisations are at least as good as the existing
inline-bytecode-caching, and are sometimes much better (because the new
ones apply caching to a wider variety of map lookups).

The new optimisations can also benefit code generated by the native
emitter, because they apply to the runtime rather than the generated code.
The improvement for the native emitter when LOAD_ATTR_FAST_PATH and
MAP_LOOKUP_CACHE are enabled is (same Linux environment as above):

diff of scores (higher is better)
N=2000 M=2000        native -> nat-attrmapcache  diff      diff% (error%)
bm_chaos.py        14130.62 ->   15464.68 :  +1334.06 =  +9.441% (+/-7.11%)
bm_fannkuch.py        74.96 ->      76.16 :     +1.20 =  +1.601% (+/-1.80%)
bm_fft.py         166682.99 ->  168221.86 :  +1538.87 =  +0.923% (+/-4.20%)
bm_float.py       233415.23 ->  265524.90 : +32109.67 = +13.756% (+/-2.57%)
bm_hexiom.py         628.59 ->     734.17 :   +105.58 = +16.796% (+/-1.39%)
bm_nqueens.py     225418.44 ->  232926.45 :  +7508.01 =  +3.331% (+/-3.10%)
bm_pidigits.py      6322.00 ->    6379.52 :    +57.52 =  +0.910% (+/-5.62%)
misc_aes.py        20670.10 ->   27223.18 :  +6553.08 = +31.703% (+/-1.56%)
misc_mandel.py    138221.11 ->  152014.01 : +13792.90 =  +9.979% (+/-2.46%)
misc_pystone.py    85032.14 ->  105681.44 : +20649.30 = +24.284% (+/-2.25%)
misc_raytrace.py   19800.01 ->   23350.73 :  +3550.72 = +17.933% (+/-2.79%)

In summary, compared to MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE, the new
MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE options:
- are simpler;
- take less code size;
- are faster (generally);
- work with code generated by the native emitter;
- can be used on embedded targets with a small and constant RAM overhead;
- allow the same .mpy bytecode to run on all targets.

See #7680 for further discussion.  And see also #7653 for a discussion
about simplifying mpy-cross options.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2021-09-16 16:04:03 +10:00
Damien George ad4656b861 all: Rename BYTES_PER_WORD to MP_BYTES_PER_OBJ_WORD.
The "word" referred to by BYTES_PER_WORD is actually the size of mp_obj_t
which is not always the same as the size of a pointer on the target
architecture.  So rename this config value to better reflect what it
measures, and also prefix it with MP_.

For uses of BYTES_PER_WORD in setting the stack limit this has been
changed to sizeof(void *), because the stack usually grows with
machine-word sized values (eg an nlr_buf_t has many machine words in it).

Signed-off-by: Damien George <damien@micropython.org>
2021-02-04 22:46:42 +11:00
David Lechner 6943fb60fe mpy-cross/main: Print uncaught nlr jump to stderr.
This is to be consistent with the same change in the unix port,
4ab8bee82f.
2020-04-16 16:22:25 +10:00
David Lechner 803e5eadea mpy-cross/main: Fix stderr_print_strn parameter type.
Change mp_uint_t to size_t to match the mp_print_strn_t function prototype.
This fixes a compiler warning when mp_uint_t and size_t are not the same
size.
2020-04-16 16:18:13 +10:00
Damien George 69661f3343 all: Reformat C and Python source code with tools/codeformat.py.
This is run with uncrustify 0.70.1, and black 19.10b0.
2020-02-28 10:33:03 +11:00
Damien George 82a19cb39f mpy-cross: Support armv7em, armv7emsp, armv7emdp architectures. 2019-12-03 12:32:29 +11:00
Damien George 1d21b4e7d1 mpy-cross: Enable Xtensa-Windowed native emitter.
Selectable via the command line: -march=xtensawin.
2019-10-05 13:45:32 +10:00
Damien George b596638b9b mpy-cross: Set number of registers in nlr_buf_t based on native arch.
Fixes #5059.  Done in collaboration with Jim Mussared.
2019-09-26 15:53:05 +10:00
Damien George af20c2ead3 py: Add global default_emit_opt variable to make emit kind persistent.
mp_compile no longer takes an emit_opt argument, rather this setting is now
provided by the global default_emit_opt variable.

Now, when -X emit=native is passed as a command-line option, the emitter
will be set for all compiled modules (included imports), not just the
top-level script.

In the future there could be a way to also set this variable from a script.

Fixes issue #4267.
2019-08-28 12:47:58 +10:00
Damien George 8e3e05761e mpy-cross/main: Only accept full emit cmdline options if native enabled. 2019-08-28 12:47:58 +10:00
Damien George 7e90e22ea5 mpy-cross: Add --version command line option to print version info.
Prints something like:

MicroPython v1.10-304-g8031b7a25 on 2019-05-02; mpy-cross emitting mpy v4
2019-05-07 13:54:20 +10:00
Damien George e70c438c71 mpy-cross: Automatically select ARMV6 arch when running on such a host. 2019-05-01 15:31:00 +10:00
Damien George 1556af19bf mpy-cross: Support compiling with MICROPY_PY___FILE__ enabled. 2019-03-26 18:19:21 +11:00
Damien George 9c9bc65e74 mpy-cross: Add "-march=<arch>" option to select native emitter. 2019-03-14 12:22:25 +11:00
Damien George a3dc1b1957 all: Remove inclusion of internal py header files.
Header files that are considered internal to the py core and should not
normally be included directly are:
    py/nlr.h - internal nlr configuration and declarations
    py/bc0.h - contains bytecode macro definitions
    py/runtime0.h - contains basic runtime enums

Instead, the top-level header files to include are one of:
    py/obj.h - includes runtime0.h and defines everything to use the
        mp_obj_t type
    py/runtime.h - includes mpstate.h and hence nlr.h, obj.h, runtime0.h,
        and defines everything to use the general runtime support functions

Additional, specific headers (eg py/objlist.h) can be included if needed.
2017-10-04 12:37:50 +11:00
Damien George 4a93801c12 all: Update Makefiles and others to build with new ports/ dir layout.
Also renames "stmhal" to "stm32" in documentation and everywhere else.
2017-09-06 14:09:13 +10:00
Damien George aa7be82a4d all: Don't include system errno.h when it's not needed. 2017-07-24 18:43:14 +10:00
Damien George 97142000f7 mpy-cross/main: Move lexer constructor to within NLR handler block. 2017-03-14 11:52:05 +11:00
Damien George 85ae17c993 mpy-cross: Get compiling after recent persistent code refactors. 2016-11-16 20:25:36 +11:00
stijn 9bdb82ef6b mpy-cross: Use binary file translation mode for creating mpy files on windows
This is a fix for https://github.com/micropython/micropython/issues/2209:
by default a file created using open() uses text translation mode so writing
\n to it will result in the file having \r\n. This is obviously problematic
for binary .mpy files, so provide functions for setting the open mode
and use binary mode in mpy-cross' main().
2016-07-22 21:21:54 +03:00
Damien George 74fb4e795b mpy-cross: Add -s option to specify the embedded source filename.
.mpy files contain the name of the source file that they were compiled
from.  This patch adds a way to change this name to an arbitrary string,
specified on the command line with the -s option.  The default is to use
the full name of the input filename.

This new -s option is useful to strip off a leading directory name so
that mpy-tool.py can freeze packages.
2016-05-23 13:25:54 +01:00
Damien George 3d1d92acfc mpy-cross: Give a more sensible error message when file doesn't exist. 2016-03-02 16:12:00 +00:00
Damien George 56f76b873a mpy-cross: Add new component, a cross compiler for MicroPython bytecode.
This component allows to generate .mpy files (pre compiled bytecode)
which can be executed within any MicroPython runtime/VM.
2016-02-25 10:12:21 +00:00