Taking the address of a local variable leads to increased stack usage, so
the mp_decode_uint_skip() function is added to reduce the need for taking
addresses. The changes in this patch reduce stack usage of a Python call
by 8 bytes on ARM Thumb, by 16 bytes on non-windowing Xtensa archs, and by
16 bytes on x86-64. Code size is also slightly reduced on most archs by
around 32 bytes.
Instead of caching data that is constant (code_info, const_table and
n_state), store just a pointer to the underlying function object from which
this data can be derived.
This helps reduce stack usage for the case when the mp_code_state_t
structure is stored on the stack, as well as heap usage when it's stored
on the heap.
The downside is that the VM becomes a little more complex because it now
needs to derive the data from the underlying function object. But this
doesn't impact the performance by much (if at all) because most of the
decoding of data is done outside the main opcode loop. Measurements using
pystone show that little to no performance is lost.
This patch also fixes a nasty bug whereby the bytecode can be reclaimed by
the GC during execution. With this patch there is always a pointer to the
function object held by the VM during execution, since it's stored in the
mp_code_state_t structure.
Allows to iterate over the following without allocating on the heap:
- tuple
- list
- string, bytes
- bytearray, array
- dict (not dict.keys, dict.values, dict.items)
- set, frozenset
Allows to call the following without heap memory:
- all, any, min, max, sum
TODO: still need to allocate stack memory in bytecode for iter_buf.
Checks for number of args removes where guaranteed by function descriptor,
self checking is replaced with mp_check_self(). In few cases, exception
is raised instead of assert.
This patch changes the type signature of .make_new and .call object method
slots to use size_t for n_args and n_kw (was mp_uint_t. Makes code more
efficient when mp_uint_t is larger than a machine word. Doesn't affect
ports when size_t and mp_uint_t have the same size.
This allows the mp_obj_t type to be configured to something other than a
pointer-sized primitive type.
This patch also includes additional changes to allow the code to compile
when sizeof(mp_uint_t) != sizeof(void*), such as using size_t instead of
mp_uint_t, and various casts.
Unfortunately, MP_OBJ_STOP_ITERATION doesn't have means to pass an associated
value, so we can't optimize StopIteration exception with (non-None) argument
to MP_OBJ_STOP_ITERATION.
Previous to this patch the printing mechanism was a bit of a tangled
mess. This patch attempts to consolidate printing into one interface.
All (non-debug) printing now uses the mp_print* family of functions,
mainly mp_printf. All these functions take an mp_print_t structure as
their first argument, and this structure defines the printing backend
through the "print_strn" function of said structure.
Printing from the uPy core can reach the platform-defined print code via
two paths: either through mp_sys_stdout_obj (defined pert port) in
conjunction with mp_stream_write; or through the mp_plat_print structure
which uses the MP_PLAT_PRINT_STRN macro to define how string are printed
on the platform. The former is only used when MICROPY_PY_IO is defined.
With this new scheme printing is generally more efficient (less layers
to go through, less arguments to pass), and, given an mp_print_t*
structure, one can call mp_print_str for efficiency instead of
mp_printf("%s", ...). Code size is also reduced by around 200 bytes on
Thumb2 archs.
This patch gets full function argument passing working with native
emitter. Includes named args, keyword args, default args, var args
and var keyword args. Fully Python compliant.
It reuses the bytecode mp_setup_code_state function to do all the hard
work. This function is slightly adjusted to accommodate native calls,
and the native emitter is forced a bit to emit similar prelude and
code-info as bytecode.
This saves a lot of RAM for 2 reasons:
1. For functions that don't have default values, var args or var kw
args (which is a large number of functions in the general case), the
mp_obj_fun_bc_t type now fits in 1 GC block (previously needed 2 because
of the extra pointer to point to the arg_names array). So this saves 16
bytes per function (32 bytes on 64-bit machines).
2. Combining separate memory regions generally saves RAM because the
unused bytes at the end of the GC block are saved for 1 of the blocks
(since that block doesn't exist on its own anymore). So generally this
saves 8 bytes per function.
Tested by importing lots of modules:
- 64-bit Linux gave about an 8% RAM saving for 86k of used RAM.
- pyboard gave about a 6% RAM saving for 31k of used RAM.
Code-info size, block name, source name, n_state and n_exc_stack now use
variable length encoded uints. This saves 7-9 bytes per bytecode
function for most functions.
This improves stack usage in callers to mp_execute_bytecode2, and is step
forward towards unifying execution interface for function and generators
(which is important because generators don't even support full forms
of arguments passing (keywords, etc.)).
Blanket wide to all .c and .h files. Some files originating from ST are
difficult to deal with (license wise) so it was left out of those.
Also merged modpyb.h, modos.h, modstm.h and modtime.h in stmhal/.