This allows the compiler to merge strings: e.g. "update",
"difference_update" and "symmetric_difference_update" will all point to the
same memory.
No functional change.
The size reduction depends on the number of qstrs in the build. The change
this commit brings is:
bare-arm: -4 -0.007%
minimal x86: +150 +0.092% [incl +48(data)]
unix x64: -608 -0.118%
unix nanbox: -572 -0.126% [incl +32(data)]
stm32: -1392 -0.352% PYBV10
cc3200: -448 -0.244%
esp8266: -1208 -0.173% GENERIC
esp32: -1028 -0.068% GENERIC[incl -1020(data)]
nrf: -440 -0.252% pca10040
rp2: -1072 -0.217% PICO
samd: -368 -0.264% ADAFRUIT_ITSYBITSY_M4_EXPRESS
Performance is also improved (on bare metal at least) for the
core_import_mpy_multi.py, core_import_mpy_single.py and core_qstr.py
performance benchmarks.
Originally at adafruit#4583
Signed-off-by: Artyom Skrobov <tyomitch@gmail.com>
The mp_sys_path_obj and mp_sys_argv_obj objects are only used by the
runtime and accessible from Python if MICROPY_PY_SYS is enabled. So
exclude them from the runtime state if this option is disabled.
Signed-off-by: Damien George <damien@micropython.org>
If MICROPY_PY_SYS_PATH_ARGV_DEFAULTS is enabled (which it is by default)
then sys.path and sys.argv will be initialised and populated with default
values. This keeps all bare-metal ports aligned.
Signed-off-by: Damien George <damien@micropython.org>
This commit removes all parts of code associated with the existing
MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE optimisation option, including the
-mcache-lookup-bc option to mpy-cross.
This feature originally provided a significant performance boost for Unix,
but wasn't able to be enabled for MCU targets (due to frozen bytecode), and
added significant extra complexity to generating and distributing .mpy
files.
The equivalent performance gain is now provided by the combination of
MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE (which has
been enabled on the unix port in the previous commit).
It's hard to provide precise performance numbers, but tests have been run
on a wide variety of architectures (x86-64, ARM Cortex, Aarch64, RISC-V,
xtensa) and they all generally agree on the qualitative improvements seen
by the combination of MICROPY_OPT_LOAD_ATTR_FAST_PATH and
MICROPY_OPT_MAP_LOOKUP_CACHE.
For example, on a "quiet" Linux x64 environment (i3-5010U @ 2.10GHz) the
change from CACHE_MAP_LOOKUP_IN_BYTECODE, to LOAD_ATTR_FAST_PATH combined
with MAP_LOOKUP_CACHE is:
diff of scores (higher is better)
N=2000 M=2000 bccache -> attrmapcache diff diff% (error%)
bm_chaos.py 13742.56 -> 13905.67 : +163.11 = +1.187% (+/-3.75%)
bm_fannkuch.py 60.13 -> 61.34 : +1.21 = +2.012% (+/-2.11%)
bm_fft.py 113083.20 -> 114793.68 : +1710.48 = +1.513% (+/-1.57%)
bm_float.py 256552.80 -> 243908.29 : -12644.51 = -4.929% (+/-1.90%)
bm_hexiom.py 521.93 -> 625.41 : +103.48 = +19.826% (+/-0.40%)
bm_nqueens.py 197544.25 -> 217713.12 : +20168.87 = +10.210% (+/-3.01%)
bm_pidigits.py 8072.98 -> 8198.75 : +125.77 = +1.558% (+/-3.22%)
misc_aes.py 17283.45 -> 16480.52 : -802.93 = -4.646% (+/-0.82%)
misc_mandel.py 99083.99 -> 128939.84 : +29855.85 = +30.132% (+/-5.88%)
misc_pystone.py 83860.10 -> 82592.56 : -1267.54 = -1.511% (+/-2.27%)
misc_raytrace.py 21490.40 -> 22227.23 : +736.83 = +3.429% (+/-1.88%)
This shows that the new optimisations are at least as good as the existing
inline-bytecode-caching, and are sometimes much better (because the new
ones apply caching to a wider variety of map lookups).
The new optimisations can also benefit code generated by the native
emitter, because they apply to the runtime rather than the generated code.
The improvement for the native emitter when LOAD_ATTR_FAST_PATH and
MAP_LOOKUP_CACHE are enabled is (same Linux environment as above):
diff of scores (higher is better)
N=2000 M=2000 native -> nat-attrmapcache diff diff% (error%)
bm_chaos.py 14130.62 -> 15464.68 : +1334.06 = +9.441% (+/-7.11%)
bm_fannkuch.py 74.96 -> 76.16 : +1.20 = +1.601% (+/-1.80%)
bm_fft.py 166682.99 -> 168221.86 : +1538.87 = +0.923% (+/-4.20%)
bm_float.py 233415.23 -> 265524.90 : +32109.67 = +13.756% (+/-2.57%)
bm_hexiom.py 628.59 -> 734.17 : +105.58 = +16.796% (+/-1.39%)
bm_nqueens.py 225418.44 -> 232926.45 : +7508.01 = +3.331% (+/-3.10%)
bm_pidigits.py 6322.00 -> 6379.52 : +57.52 = +0.910% (+/-5.62%)
misc_aes.py 20670.10 -> 27223.18 : +6553.08 = +31.703% (+/-1.56%)
misc_mandel.py 138221.11 -> 152014.01 : +13792.90 = +9.979% (+/-2.46%)
misc_pystone.py 85032.14 -> 105681.44 : +20649.30 = +24.284% (+/-2.25%)
misc_raytrace.py 19800.01 -> 23350.73 : +3550.72 = +17.933% (+/-2.79%)
In summary, compared to MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE, the new
MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE options:
- are simpler;
- take less code size;
- are faster (generally);
- work with code generated by the native emitter;
- can be used on embedded targets with a small and constant RAM overhead;
- allow the same .mpy bytecode to run on all targets.
See #7680 for further discussion. And see also #7653 for a discussion
about simplifying mpy-cross options.
Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
The existing inline bytecode caching optimisation, selected by
MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE, reserves an extra byte in the
bytecode after certain opcodes, which at runtime stores a map index of the
likely location of this field when looking up the qstr. This scheme is
incompatible with bytecode-in-ROM, and doesn't work with native generated
code. It also stores bytecode in .mpy files which is of a different format
to when the feature is disabled, making generation of .mpy files more
complex.
This commit provides an alternative optimisation via an approach that adds
a global cache for map offsets, then all mp_map_lookup operations use it.
It's less precise than bytecode caching, but allows the cache to be
independent and external to the bytecode that is executing. It also works
for the native emitter and adds a similar performance boost on top of the
gain already provided by the native emitter.
Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
The MP_OBJ_STOP_ITERATION optimisation is a shortcut for creating a
StopIteration() exception object, and means that heap memory does not need
to be allocated for the exception (in cases where it can be used). This
commit allows this optimised object to take an optional argument (before,
it could only have no argument).
The commit also adds some new tests to cover corner cases with
StopIteration and generators that previously did not work.
Signed-off-by: Damien George <damien@micropython.org>
This introduces a new macro to get the main thread and uses it to ensure
that asynchronous exceptions such as KeyboardInterrupt (CTRL+C) are only
scheduled on the main thread. This is more deterministic than being
scheduled on a random thread and is more in line with CPython that only
allow signal handlers to run on the main thread.
Fixes issue #7026.
Signed-off-by: David Lechner <david@pybricks.com>
This moves mp_pending_exception from mp_state_vm_t to mp_state_thread_t.
This allows exceptions to be scheduled on a specific thread.
Signed-off-by: David Lechner <david@pybricks.com>
This commit makes gc_lock_depth have one counter per thread, instead of one
global counter. This makes threads properly independent with respect to
the GC, in particular threads can now independently lock the GC for
themselves without locking it for other threads. It also means a given
thread can run a hard IRQ without temporarily locking the GC for all other
threads and potentially making them have MemoryError exceptions at random
locations (this really only occurs on MCUs with multiple cores and no GIL,
eg on the rp2 port).
The commit also removes protection of the GC lock/unlock functions, which
is no longer needed when the counter is per thread (and this also fixes the
cas where a hard IRQ calling gc_lock() may stall waiting for the mutex).
It also puts the check for `gc_lock_depth > 0` outside the GC mutex in
gc_alloc, gc_realloc and gc_free, to potentially prevent a hard IRQ from
waiting on a mutex if it does attempt to allocate heap memory (and putting
the check outside the GC mutex is now safe now that there is a
gc_lock_depth per thread).
Signed-off-by: Damien George <damien@micropython.org>
On ports where normal heap memory can contain executable code (eg ARM-based
ports such as stm32), native code loaded from an .mpy file may be reclaimed
by the GC because there's no reference to the very start of the native
machine code block that is reachable from root pointers (only pointers to
internal parts of the machine code block are reachable, but that doesn't
help the GC find the memory).
This commit fixes this issue by maintaining an explicit list of root
pointers pointing to native code that is loaded from an .mpy file. This
is not needed for all ports so is selectable by the new configuration
option MICROPY_PERSISTENT_CODE_TRACK_RELOC_CODE. It's enabled by default
if a port does not specify any special functions to allocate or commit
executable memory.
A test is included to test that native code loaded from an .mpy file does
not get reclaimed by the GC.
Fixes#6045.
Signed-off-by: Damien George <damien@micropython.org>
When threads and the GIL are enabled, then the qstr mutex is not needed.
The qstr_mutex field is never used in this case because of:
#if MICROPY_PY_THREAD && !MICROPY_PY_THREAD_GIL
#define QSTR_ENTER() mp_thread_mutex_lock(&MP_STATE_VM(qstr_mutex), 1)
#define QSTR_EXIT() mp_thread_mutex_unlock(&MP_STATE_VM(qstr_mutex))
#else
#define QSTR_ENTER()
#define QSTR_EXIT()
#endif
So, we can completely remove qstr_mutex everywhere when MICROPY_PY_THREAD
&& !MICROPY_PY_THREAD_GIL.
When threads and the GIL are enabled, then the GC mutex is not needed. The
gc_mutex field is never used in this case because of:
#if MICROPY_PY_THREAD && !MICROPY_PY_THREAD_GIL
#define GC_ENTER() mp_thread_mutex_lock(&MP_STATE_MEM(gc_mutex), 1)
#define GC_EXIT() mp_thread_mutex_unlock(&MP_STATE_MEM(gc_mutex))
#else
#define GC_ENTER()
#define GC_EXIT()
#endif
So, we can completely remove gc_mutex everywhere when MICROPY_PY_THREAD
&& !MICROPY_PY_THREAD_GIL.
This commit adds support for sys.settrace, allowing to install Python
handlers to trace execution of Python code. The interface follows CPython
as closely as possible. The feature is disabled by default and can be
enabled via MICROPY_PY_SYS_SETTRACE.
mp_compile no longer takes an emit_opt argument, rather this setting is now
provided by the global default_emit_opt variable.
Now, when -X emit=native is passed as a command-line option, the emitter
will be set for all compiled modules (included imports), not just the
top-level script.
In the future there could be a way to also set this variable from a script.
Fixes issue #4267.
This patch implements a new sys.atexit function which registers a function
that is later executed when the main script ends. It is configurable via
MICROPY_PY_SYS_ATEXIT, disabled by default.
This is not compliant with CPython, rather it can be used to implement a
CPython compatible "atexit" module if desired (similar to how
sys.print_exception can be used to implement functionality of the
"traceback" module).
Behaviour was changed from stack to queue in
8977c7eb58, and this updates variable names
to match. Also updates other references (docs, error messages).
A new option MICROPY_GC_STACK_ENTRY_TYPE is added to select a custom type
instead of size_t for the gc_stack array items. This can be beneficial for
small devices, especially those that are low on memory anyway. If a device
has 1MB or less of heap (and 16-byte GC blocks) then this type can be
uint16_t, saving 128 bytes of RAM.
This patch changes dupterm to call the native C stream methods on the
connected stream objects, instead of calling the Python readinto/write
methods. This is much more efficient for native stream objects like UART
and webrepl and doesn't require allocating a special dupterm array.
This change is a minor breaking change from the user's perspective because
dupterm no longer accepts pure user stream objects to duplicate on. But
with the recent addition of uio.IOBase it is possible to still create such
classes just by inheriting from uio.IOBase, for example:
import uio, uos
class MyStream(uio.IOBase):
def write(self, buf):
# existing write implementation
def readinto(self, buf):
# existing readinto implementation
uos.dupterm(MyStream())
This patch moves the start of the root pointer section in mp_state_ctx_t
so that it skips entries that are not pointers and don't need scanning.
Previously, the start of the root pointer section was at the very beginning
of the mp_state_ctx_t struct (which is the beginning of mp_state_thread_t).
This was the original assembler version of the NLR code was hard-coded to
have the nlr_top pointer at the start of this state structure. But now
that the NLR code is partially written in C there is no longer this
restriction on the location of nlr_top (and a comment to this effect has
been removed in this patch).
So now the root pointer section starts part way through the
mp_state_thread_t structure, after the entries which are not root pointers.
This patch also moves the non-pointer entries for MICROPY_ENABLE_SCHEDULER
outside the root pointer section.
Moving non-pointer entries out of the root pointer section helps to make
the GC more precise and should help to prevent some cases of collectable
garbage being kept.
This patch also has a measurable improvement in performance of the
pystone.py benchmark: on unix x86-64 and stm32 there was an improvement of
roughly 0.6% (tested with both gcc 7.3 and gcc 8.1).
Without the compiler enabled the mp_optimise_value is unused, and the
micropython.opt_level() function is not useful, so exclude these from the
build to save RAM and code size.
This patch introduces the MICROPY_ENABLE_PYSTACK option (disabled by
default) which enables a "Python stack" that allows to allocate and free
memory in a scoped, or Last-In-First-Out (LIFO) way, similar to alloca().
A new memory allocation API is introduced along with this Py-stack. It
includes both "local" and "nonlocal" LIFO allocation. Local allocation is
intended to be equivalent to using alloca(), whereby the same function must
free the memory. Nonlocal allocation is where another function may free
the memory, so long as it's still LIFO.
Follow-up patches will convert all uses of alloca() and VLA to the new
scoped allocation API. The old behaviour (using alloca()) will still be
available, but when MICROPY_ENABLE_PYSTACK is enabled then alloca() is no
longer required or used.
The benefits of enabling this option are (or will be once subsequent
patches are made to convert alloca()/VLA):
- Toolchains without alloca() can use this feature to obtain correct and
efficient scoped memory allocation (compared to using the heap instead
of alloca(), which is slower).
- Even if alloca() is available, enabling the Py-stack gives slightly more
efficient use of stack space when calling nested Python functions, due to
the way that compilers implement alloca().
- Enabling the Py-stack with the stackless mode allows for even more
efficient stack usage, as well as retaining high performance (because the
heap is no longer used to build and destroy stackless code states).
- With Py-stack and stackless enabled, Python-calling-Python is no longer
recursive in the C mp_execute_bytecode function.
The micropython.pystack_use() function is included to measure usage of the
Python stack.
The uos.dupterm() signature and behaviour is updated to reflect the latest
enhancements in the docs. It has minor backwards incompatibility in that
it no longer accepts zero arguments.
The dupterm_rx helper function is moved from esp8266 to extmod and
generalised to support multiple dupterm slots.
A port can specify multiple slots by defining the MICROPY_PY_OS_DUPTERM
config macro to an integer, being the number of slots it wants to have;
0 means to disable the dupterm feature altogether.
The unix and esp8266 ports are updated to work with the new interface and
are otherwise unchanged with respect to functionality.
The code conventions suggest using header guards, but do not define how
those should look like and instead point to existing files. However, not
all existing files follow the same scheme, sometimes omitting header guards
altogether, sometimes using non-standard names, making it easy to
accidentally pick a "wrong" example.
This commit ensures that all header files of the MicroPython project (that
were not simply copied from somewhere else) follow the same pattern, that
was already present in the majority of files, especially in the py folder.
The rules are as follows.
Naming convention:
* start with the words MICROPY_INCLUDED
* contain the full path to the file
* replace special characters with _
In addition, there are no empty lines before #ifndef, between #ifndef and
one empty line before #endif. #endif is followed by a comment containing
the name of the guard macro.
py/grammar.h cannot use header guards by design, since it has to be
included multiple times in a single C file. Several other files also do not
need header guards as they are only used internally and guaranteed to be
included only once:
* MICROPY_MPHALPORT_H
* mpconfigboard.h
* mpconfigport.h
* mpthreadport.h
* pin_defs_*.h
* qstrdefs*.h
This buffer is used to allocate objects temporarily, and such objects
require that their underlying memory be correctly aligned for their data
type. Aligning for mp_obj_t should be sufficient for emergency exceptions,
but in general the memory buffer should aligned to the maximum alignment of
the machine (eg on a 32-bit machine with mp_obj_t being 4 bytes, a double
may not be correctly aligned).
This patch fixes a bug for certain nan-boxing builds, where mp_obj_t is 8
bytes and must be aligned to 8 bytes (even though the machine is 32 bit).
Each threads needs to have its own private references to its current
locals/globals dicts, otherwise functions running within different
contexts (eg imported from different files) can behave very strangely.
This provides mp_vfs_XXX functions (eg mount, open, listdir) which are
agnostic to the underlying filesystem type, and just require an object with
the relevant filesystem-like methods (eg .mount, .open, .listidr) which can
then be mounted.
These mp_vfs_XXX functions would typically be used by a port to implement
the "uos" module, and mp_vfs_open would be the builtin open function.
This feature is controlled by MICROPY_VFS, disabled by default.
Defining and initialising mp_kbd_exception is boiler-plate code and so the
core runtime can provide it, instead of each port needing to do it
themselves.
The exception object is placed in the VM state rather than on the heap.
Currently, MicroPython runs GC when it could not allocate a block of memory,
which happens when heap is exhausted. However, that policy can't work well
with "inifinity" heaps, e.g. backed by a virtual memory - there will be a
lot of swap thrashing long before VM will be exhausted. Instead, in such
cases "allocation threshold" policy is used: a GC is run after some number of
allocations have been made. Details vary, for example, number or total amount
of allocations can be used, threshold may be self-adjusting based on GC
outcome, etc.
This change implements a simple variant of such policy for MicroPython. Amount
of allocated memory so far is used for threshold, to make it useful to typical
finite-size, and small, heaps as used with MicroPython ports. And such GC policy
is indeed useful for such types of heaps too, as it allows to better control
fragmentation. For example, if a threshold is set to half size of heap, then
for an application which usually makes big number of small allocations, that
will (try to) keep half of heap memory in a nice defragmented state for an
occasional large allocation.
For an application which doesn't exhibit such behavior, there won't be any
visible effects, except for GC running more frequently, which however may
affect performance. To address this, the GC threshold is configurable, and
by default is off so far. It's configured with gc.threshold(amount_in_bytes)
call (can be queries without an argument).
By using a single, global mutex, all memory-related functions (alloc,
free, realloc, collect, etc) are made thread safe. This means that only
one thread can be in such a function at any one time.
This new compile-time option allows to make the bytecode compiler
configurable at runtime by setting the fields in the mp_dynamic_compiler
structure. By using this feature, the compiler can generate bytecode
that targets any MicroPython runtime/VM, regardless of the host and
target compile-time settings.
Options so far that fall under this dynamic setting are:
- maximum number of bits that a small int can hold;
- whether caching of lookups is used in the bytecode;
- whether to use unicode strings or not (lexer behaviour differs, and
therefore generated string constants differ).