The maximum index into mp_fun_table is currently less than 1024 and should
stay that way to keep things efficient for all architectures, so there is
no need to handle loading the pointer directly via a literal in this
function.
All architectures now have a dedicated register to hold the pointer to the
native function table mp_fun_table, and so they all need to load this
register at the start of the native function. This commit makes the
loading of this register uniform across architectures by passing the
pointer in the constant table for the native function, and then loading the
register from the constant table. Doing it this way means that the pointer
is not stored in the assembly code, helping to make the code more portable.
Instead of storing the function pointer directly in the assembly code.
This makes the generated code more independent of the runtime (so easier to
relocate the code), and reduces the generated code size.
Instead of storing the function pointer directly in the assembly code.
This makes the generated code more independent of the runtime (so easier to
relocate the code), and reduces the generated code size.
This commit adds first class support for yield and yield-from in the native
emitter, including send and throw support, and yields enclosed in exception
handlers (which requires pulling down the NLR stack before yielding, then
rebuilding it when resuming).
This has been fully tested and is working on unix x86 and x86-64, and
stm32. Also basic tests have been done with the esp8266 port. Performance
of existing native code is unchanged.
The nlr_buf_t doesn't need to be part of the Python value stack (as it was
before this commit), it's simpler to have it separated as auxiliary state
that lives on the C stack. This will help adding yield support because in
that case the nlr_buf_t and Python value stack live in separate memory
areas (C stack and heap respectively).
This matches how bytecode does it, and matches the signature of
mp_emit_glue_assign_native. Since the native emitter doesn't support
nan-boxing uintptr_t and mp_uint_t are anyway the same bit-width.
This commit changes native code to handle constant objects like bytecode:
instead of storing the pointers inside the native code they are now stored
in a separate constant table (such pointers include objects like bignum,
bytes, and raw code for nested functions). This removes the need for the
GC to scan native code for root pointers, and takes a step towards making
native code independent of the runtime (eg so it can be compiled offline by
mpy-cross).
Note that the changes to the struct scope_t did not increase its size: on a
32-bit architecture it is still 48 bytes, and on a 64-bit architecture it
decreased from 80 to 72 bytes.
Loading a pointer by indexing into the native function table mp_fun_table,
rather than loading an immediate value (via a PC-relative load), uses less
code space.
This commit makes viper functions have the same signature as native
functions, at the level of the emitter/assembler. This means that viper
functions can now be wrapped in the same uPy object as native functions.
Viper functions are now responsible for parsing their arguments (before it
was done by the runtime), and this makes calling them more efficient (in
most cases) because the viper entry code can be custom generated to suit
the signature of the function.
This change also opens the way forward for viper functions to take
arbitrary numbers of arguments, and for them to handle globals correctly,
among other things.
Prior to this commit a function compiled with the native decorator
@micropython.native would not work correctly when accessing global
variables, because the globals dict was not being set upon function entry.
This commit fixes this problem by, upon function entry, setting as the
current globals dict the globals dict context the function was defined
within, as per normal Python semantics, and as bytecode does. Upon
function exit the original globals dict is restored.
In order to restore the globals dict when an exception is raised the native
function must guard its internals with an nlr_push/nlr_pop pair. Because
this push/pop is relatively expensive, in both C stack usage for the
nlr_buf_t and CPU execution time, the implementation here optimises things
as much as possible. First, the compiler keeps track of whether a function
even needs to access global variables. Using this information the native
emitter then generates three different kinds of code:
1. no globals used, no exception handlers: no nlr handling code and no
setting of the globals dict.
2. globals used, no exception handlers: an nlr_buf_t is allocated on the
C stack but it is not used if the globals dict is unchanged, saving
execution time because nlr_push/nlr_pop don't need to run.
3. function has exception handlers, may use globals: an nlr_buf_t is
allocated and nlr_push/nlr_pop are always called.
In the end, native functions that don't access globals and don't have
exception handlers will run more efficiently than those that do.
Fixes issue #1573.
This patch adds full support for unwinding jumps to the native emitter.
This means that return/break/continue can be used in try-except,
try-finally and with statements. For code that doesn't use unwinding jumps
there is almost no overhead added to the generated code.
The native emitter keeps the current exception in a slot in its C stack
(instead of on its Python value stack), so when it catches an exception it
must explicitly clear that slot so the same exception is not reraised later
on.
Prior to this patch, native code would use a full nlr_buf_t for each
exception handler (try-except, try-finally, with). For nested exception
handlers this would use a lot of C stack and be rather inefficient.
This patch changes how exceptions are handled in native code by setting up
only a single nlr_buf_t context for the entire function, and then manages a
state machine (using the PC) to work out which exception handler to run
when an exception is raised by an nlr_jump. This keeps the C stack usage
at a constant level regardless of the depth of Python exception blocks.
The patch also fixes an existing bug when local variables are written to
within an exception handler, then their value was incorrectly restored if
an exception was raised (since the nlr_jump would restore register values,
back to the point of the nlr_push).
And it also gets nested try-finally+with working with the viper emitter.
Broadly speaking, efficiency of executing native code that doesn't use
any exception blocks is unchanged, and emitted code size is only slightly
increased for such function. C stack usage of all native functions is
either equal or less than before. Emitted code size for native functions
that use exception blocks is increased by roughly 10% (due in part to
fixing of above-mentioned bugs).
But, most importantly, this patch allows to implement more Python features
in native code, like unwind jumps and yielding from within nested exception
blocks.
There is no need to have three copies of the exception object on the top of
the native value stack. Instead, the values on the stack should be the
first two items in an nlr_buf_t: the prev pointer and the ret_val pointer.
This is all that is needed and is what the rest of the native emitter
expects is on the stack.
This patch is essentially an optimisation. Behaviour is unchanged,
although the stack layout for native exception handling now makes more
sense.
A native function allocates space on its C stack for mp_code_state_t,
followed by its Python stack, then its locals. This patch makes sure that
the native function actually starts at the start of its Python stack,
rather than at the start of mp_code_state_t (which didn't lead to any
issues so far because the mp_code_state_t is unused after the native
function sets itself up).
On x86 archs (both 32 and 64 bit) a bool return value only sets the 8-bit
al register, and the higher bits of the ax register have an undefined
value. When testing the return value of such cases it is required to just
test al for zero/non-zero. On the other hand, checking for truth or
zero/non-zero on an integer return value requires checking all bits of the
register. These two cases must be distinguished and handled correctly in
generated native code. This patch makes sure of this.
For other supported native archs (ARM, Thumb2, Xtensa) there is no such
distinction and this patch does not change anything for them.
In non-debug mode MP_OBJ_STOP_ITERATION is zero and comparing something to
zero can be done more efficiently in assembler than comparing to a non-zero
value.
Instead of emitnative.c having configuration code for each supported
architecture, and then compiling this file multiple times with different
macros defined, this patch adds a file per architecture with the necessary
code to configure the native emitter. These files then #include the
emitnative.c file.
This simplifies emitnative.c (which is already very large), and simplifies
the build system because emitnative.c no longer needs special handling for
compilation and qstr extraction.
All the asm macro names that convert a particular architecture to a generic
interface now follow the convention whereby the "destination" (usually a
register) is specified first.
Header files that are considered internal to the py core and should not
normally be included directly are:
py/nlr.h - internal nlr configuration and declarations
py/bc0.h - contains bytecode macro definitions
py/runtime0.h - contains basic runtime enums
Instead, the top-level header files to include are one of:
py/obj.h - includes runtime0.h and defines everything to use the
mp_obj_t type
py/runtime.h - includes mpstate.h and hence nlr.h, obj.h, runtime0.h,
and defines everything to use the general runtime support functions
Additional, specific headers (eg py/objlist.h) can be included if needed.
- Changed: ValueError, TypeError, NotImplementedError
- OSError invocations unchanged, because the corresponding utility
function takes ints, not strings like the long form invocation.
- OverflowError, IndexError and RuntimeError etc. not changed for now
until we decide whether to add new utility functions.
This patch allows the following code to run without allocating on the heap:
super().foo(...)
Before this patch such a call would allocate a super object on the heap and
then load the foo method and call it right away. The super object is only
needed to perform the lookup of the method and not needed after that. This
patch makes an optimisation to allocate the super object on the C stack and
discard it right after use.
Changes in code size due to this patch are:
bare-arm: +128
minimal: +232
unix x64: +416
unix nanbox: +364
stmhal: +184
esp8266: +340
cc3200: +128
It improves readability of code and reduces the chance to make a mistake.
This patch also fixes a bug with nan-boxing builds by rounding up the
calculation of the new NSLOTS variable, giving the correct number of slots
(being 4) even if mp_obj_t is larger than the native machine size.
Instead of caching data that is constant (code_info, const_table and
n_state), store just a pointer to the underlying function object from which
this data can be derived.
This helps reduce stack usage for the case when the mp_code_state_t
structure is stored on the stack, as well as heap usage when it's stored
on the heap.
The downside is that the VM becomes a little more complex because it now
needs to derive the data from the underlying function object. But this
doesn't impact the performance by much (if at all) because most of the
decoding of data is done outside the main opcode loop. Measurements using
pystone show that little to no performance is lost.
This patch also fixes a nasty bug whereby the bytecode can be reclaimed by
the GC during execution. With this patch there is always a pointer to the
function object held by the VM during execution, since it's stored in the
mp_code_state_t structure.
Allows to iterate over the following without allocating on the heap:
- tuple
- list
- string, bytes
- bytearray, array
- dict (not dict.keys, dict.values, dict.items)
- set, frozenset
Allows to call the following without heap memory:
- all, any, min, max, sum
TODO: still need to allocate stack memory in bytecode for iter_buf.
The 3 kinds of comprehensions are similar enough that merging their emit
functions reduces code size. Decreases in code size in bytes are:
bare-arm:24, minimal:96, unix(NDEBUG,x86-64):328, stmhal:80, esp8266:76.
Before this patch, the native types for uint and ptr/ptr8/ptr16/ptr32
all overlapped and it was possible to make a mistake in casting. Now,
these types are all separate and any coding mistakes will be raised
as runtime errors.
Fixes#1684 and makes "not" match Python semantics. The code is also
simplified (the separate MP_BC_NOT opcode is removed) and the patch saves
68 bytes for bare-arm/ and 52 bytes for minimal/.
Previously "not x" was implemented as !mp_unary_op(x, MP_UNARY_OP_BOOL),
so any given object only needs to implement MP_UNARY_OP_BOOL (and the VM
had a special opcode to do the ! bit).
With this patch "not x" is implemented as mp_unary_op(x, MP_UNARY_OP_NOT),
but this operation is caught at the start of mp_unary_op and dispatched as
!mp_obj_is_true(x). mp_obj_is_true has special logic to test for
truthness, and is the correct way to handle the not operation.
Main changes when MICROPY_PERSISTENT_CODE is enabled are:
- qstrs are encoded as 2-byte fixed width in the bytecode
- all pointers are removed from bytecode and put in const_table (this
includes const objects and raw code pointers)
Ultimately this option will enable persistence for not just bytecode but
also native code.
ViperTypeError now includes filename and function name where the error
occurred. The line number is the line number of the start of the
function definition, which is the best that can be done without a lot
more work.
Partially addresses issue #1381.
Previous to this patch each time a bytes object was referenced a new
instance (with the same data) was created. With this patch a single
bytes object is created in the compiler and is loaded directly at execute
time as a true constant (similar to loading bignum and float objects).
This saves on allocating RAM and means that bytes objects can now be
used when the memory manager is locked (eg in interrupts).
The MP_BC_LOAD_CONST_BYTES bytecode was removed as part of this.
Generated bytecode is slightly larger due to storing a pointer to the
bytes object instead of the qstr identifier.
Code size is reduced by about 60 bytes on Thumb2 architectures.
This allows to do "ar[i]" and "ar[i] = val" in viper when ar is a Python
object and i and/or val are native viper types (eg ints).
Patch also includes tests for this feature.
The code was apparently broken after 9988618e0e
"py: Implement full func arg passing for native emitter.". This attempts to
propagate those changes to ARM emitter.
This fixes a long standing problem that viper code generation gave
terrible error messages, and actually no errors on pyboard where
assertions are disabled.
Now all compile-time errors are raised as proper Python exceptions, and
are of type ViperTypeError.
Addresses issue #940.