POP_BLOCK and POP_EXCEPT are now the same, and are always followed by a
JUMP. So this optimisation reduces code size, and RAM usage of bytecode by
two bytes for each try-except handler.
This patch fixes a bug in the VM when breaking within a try-finally. The
bug has to do with executing a break within the finally block of a
try-finally statement. For example:
def f():
for x in (1,):
print('a', x)
try:
raise Exception
finally:
print(1)
break
print('b', x)
f()
Currently in uPy the above code will print:
a 1
1
1
segmentation fault (core dumped) micropython
Not only is there a seg fault, but the "1" in the finally block is printed
twice. This is because when the VM executes a finally block it doesn't
really know if that block was executed due to a fall-through of the try (no
exception raised), or because an exception is active. In particular, for
nested finallys the VM has no idea which of the nested ones have active
exceptions and which are just fall-throughs. So when a break (or continue)
is executed it tries to unwind all of the finallys, when in fact only some
may be active.
It's questionable whether break (or return or continue) should be allowed
within a finally block, because they implicitly swallow any active
exception, but nevertheless it's allowed by CPython (although almost never
used in the standard library). And uPy should at least not crash in such a
case.
The solution here relies on the fact that exception and finally handlers
always appear in the bytecode after the try body.
Note: there was a similar bug with a return in a finally block, but that
was previously fixed in b735208403
This optimisation eliminates the need to create a temporary normal dict.
The optimisation is enabled via MICROPY_COMP_CONST_LITERAL which is enabled
by default (although only has an effect if OrderdDict is enabled).
Thanks to @pfalcon for the initial idea and implementation.
All exceptions that unwind through the async-with must be caught and
BaseException is the top-level class, which includes Exception and others.
Fixes issue #4552.
These macros could in principle be (inline) functions so it makes sense to
have them lower case, to match the other C API functions.
The remaining macros that are upper case are:
- MP_OBJ_TO_PTR, MP_OBJ_FROM_PTR
- MP_OBJ_NEW_SMALL_INT, MP_OBJ_SMALL_INT_VALUE
- MP_OBJ_NEW_QSTR, MP_OBJ_QSTR_VALUE
- MP_OBJ_FUN_MAKE_SIG
- MP_DECLARE_CONST_xxx
- MP_DEFINE_CONST_xxx
These must remain macros because they are used when defining const data (at
least, MP_OBJ_NEW_SMALL_INT is so it makes sense to have
MP_OBJ_SMALL_INT_VALUE also a macro).
For those macros that have been made lower case, compatibility macros are
provided for the old names so that users do not need to change their code
immediately.
Taking the address of a local variable is mildly expensive, in code size
and stack usage. So optimise scope_find_or_add_id() to not need to take a
pointer to the "added" variable, and instead take the kind to use for newly
added identifiers.
This ensures that implicit variables are only converted to implicit
closed-over variables (nonlocals) at the very end of the function scope.
If variables are closed-over when first used (read from, as was done prior
to this commit) then this can be incorrect because the variable may be
assigned to later on in the function which means they are just a plain
local, not closed over.
Fixes issue #4272.
This commit adds first class support for yield and yield-from in the native
emitter, including send and throw support, and yields enclosed in exception
handlers (which requires pulling down the NLR stack before yielding, then
rebuilding it when resuming).
This has been fully tested and is working on unix x86 and x86-64, and
stm32. Also basic tests have been done with the esp8266 port. Performance
of existing native code is unchanged.
This commit changes native code to handle constant objects like bytecode:
instead of storing the pointers inside the native code they are now stored
in a separate constant table (such pointers include objects like bignum,
bytes, and raw code for nested functions). This removes the need for the
GC to scan native code for root pointers, and takes a step towards making
native code independent of the runtime (eg so it can be compiled offline by
mpy-cross).
Note that the changes to the struct scope_t did not increase its size: on a
32-bit architecture it is still 48 bytes, and on a 64-bit architecture it
decreased from 80 to 72 bytes.
This commit makes viper functions have the same signature as native
functions, at the level of the emitter/assembler. This means that viper
functions can now be wrapped in the same uPy object as native functions.
Viper functions are now responsible for parsing their arguments (before it
was done by the runtime), and this makes calling them more efficient (in
most cases) because the viper entry code can be custom generated to suit
the signature of the function.
This change also opens the way forward for viper functions to take
arbitrary numbers of arguments, and for them to handle globals correctly,
among other things.
Now that the compiler can store the results of the viper types in the
scope, the viper parameter annotation compilation stage can be merged with
the normal parameter compilation stage.
Prior to this commit a function compiled with the native decorator
@micropython.native would not work correctly when accessing global
variables, because the globals dict was not being set upon function entry.
This commit fixes this problem by, upon function entry, setting as the
current globals dict the globals dict context the function was defined
within, as per normal Python semantics, and as bytecode does. Upon
function exit the original globals dict is restored.
In order to restore the globals dict when an exception is raised the native
function must guard its internals with an nlr_push/nlr_pop pair. Because
this push/pop is relatively expensive, in both C stack usage for the
nlr_buf_t and CPU execution time, the implementation here optimises things
as much as possible. First, the compiler keeps track of whether a function
even needs to access global variables. Using this information the native
emitter then generates three different kinds of code:
1. no globals used, no exception handlers: no nlr handling code and no
setting of the globals dict.
2. globals used, no exception handlers: an nlr_buf_t is allocated on the
C stack but it is not used if the globals dict is unchanged, saving
execution time because nlr_push/nlr_pop don't need to run.
3. function has exception handlers, may use globals: an nlr_buf_t is
allocated and nlr_push/nlr_pop are always called.
In the end, native functions that don't access globals and don't have
exception handlers will run more efficiently than those that do.
Fixes issue #1573.
This patch adds full support for unwinding jumps to the native emitter.
This means that return/break/continue can be used in try-except,
try-finally and with statements. For code that doesn't use unwinding jumps
there is almost no overhead added to the generated code.
Prior to this patch, native code would use a full nlr_buf_t for each
exception handler (try-except, try-finally, with). For nested exception
handlers this would use a lot of C stack and be rather inefficient.
This patch changes how exceptions are handled in native code by setting up
only a single nlr_buf_t context for the entire function, and then manages a
state machine (using the PC) to work out which exception handler to run
when an exception is raised by an nlr_jump. This keeps the C stack usage
at a constant level regardless of the depth of Python exception blocks.
The patch also fixes an existing bug when local variables are written to
within an exception handler, then their value was incorrectly restored if
an exception was raised (since the nlr_jump would restore register values,
back to the point of the nlr_push).
And it also gets nested try-finally+with working with the viper emitter.
Broadly speaking, efficiency of executing native code that doesn't use
any exception blocks is unchanged, and emitted code size is only slightly
increased for such function. C stack usage of all native functions is
either equal or less than before. Emitted code size for native functions
that use exception blocks is increased by roughly 10% (due in part to
fixing of above-mentioned bugs).
But, most importantly, this patch allows to implement more Python features
in native code, like unwind jumps and yielding from within nested exception
blocks.
Without this patch, on 64-bit architectures the "1 << (small_int_bits - 1)"
is computed using only 32-bit values (since small_int_bits is a uint8_t)
and so will overflow (and give the wrong result) if small_int_bits is
larger than 32.
Before this patch the context manager's __aexit__() method would not be
executed if a return/break/continue statement was used to exit an async
with block. async with now has the same semantics as normal with.
The fix here applies purely to the compiler, and does not modify the
runtime at all. It might (eventually) be better to define new bytecode(s)
to handle async with (and maybe other async constructs) in a cleaner, more
efficient way.
One minor drawback with addressing this issue purely in the compiler is
that it wasn't possible to get 100% CPython semantics. The thing that is
different here to CPython is that the __aexit__ method is not looked up in
the context manager until it is needed, which is after the body of the
async with statement has executed. So if a context manager doesn't have
__aexit__ then CPython raises an exception before the async with is
executed, whereas uPy will raise it after it is executed. Note that
__aenter__ is looked up at the beginning in uPy because it needs to be
called straightaway, so if the context manager isn't a context manager then
it'll still raise an exception at the same location as CPython. The only
difference is if the context manager has the __aenter__ method but not the
__aexit__ method, then in that case uPy has different behaviour. But this
is a very minor, and acceptable, difference.
This patch combines the compiler optimisation code for double and triple
tuple-to-tuple assignment, taking it from two separate if-blocks to one
combined if-block. This can be done because the code for both of these
optimisations has a lot in common. Combining them together reduces code
size for ports that have the triple-tuple optimisation enabled (and doesn't
change code size for ports that have it disabled).
The technique of using alloca is how dotted import names are composed in
mp_import_from and mp_builtin___import__, so use the same technique in the
compiler. This puts less pressure on the heap (only the stack is used if
the qstr already exists, and if it doesn't exist then the standard qstr
block memory is used for the new qstr rather than a separate chunk of the
heap) and reduces overall code size.