For architectures where size_t is less than 32 bits (eg 16 bits) the args
must be casted to uint32_t so the left shift will work. For architectures
where size_t is greater than 32 bits (eg 64 bits) this new casting will not
lose any bits because the end result must anyway fit in a uint32_t.
Changes to the layout of the bytecode header meant that this debug code was
no longer compiling. This is now fixed and a new compile-time option is
introduced, MICROPY_DEBUG_VM_STACK_OVERFLOW, to turn on this feature (which
is disabled by default). This option is needed because more than one file
needs to cooperate to make this check work.
It's more robust to have the version defined statically in a header file,
rather than dynamically generating it via git using a git tag. In case
git doesn't exist, or a different source control tool is used, it's
important to still have the uPy version number available.
The older "bool has_finaliser" gets recast as GC_ALLOC_FLAG_HAS_FINALISER=1
so this is a backwards compatible change to the signature. Since bool gets
implicitly converted to 1 this patch doesn't include conversion of all
calls.
Both mp_type_array and mp_type_memoryview use the same object structure,
mp_obj_array_t, but for the case of memoryview, some fields, e.g. "free",
have different meaning. As the "free" field is also a bitfield, assume
that (anonymous) union can't be used here (for the concerns of possible
compatibility issues with wide array of toolchains), and just add a field
alias using a #define. As it's a define, it should be a selective
identifier, so use verbose "memview_offset" to avoid any clashes.
All 4 opcodes that can have caching bytes also have qstrs, so the test for
them must go in the qstr part of the code. The reason this incorrect
calculation of the opcode size did not lead to a bug is because the caching
byte is at the end of the opcode (byte, qstr, qstr, cache) and is always
0x00 when saving/loading, so was just treated as a single byte no-op
opcode. Hence these opcodes were being saved/loaded/decoded correctly.
Thanks to @malinah for finding the problem and providing the initial patch.
mp_obj_new_exception_msg() assumes that the message passed to it is in ROM
and so can use its data directly to create the string object for the
argument of the exception, saving RAM. At the same time, this approach
also makes sure that there is no attempt to format the message with printf,
which could lead to faults if the message contained % characters.
Fixes issue #3004.
SHORT, INT, LONG, LONGLONG, and unsigned (U*) variants are being defined.
This is done at compile using GCC-style predefined macros like
__SIZEOF_INT__. If the compiler doesn't have such defines, no such types
will be defined.
Instead of assuming that the method is a bytecode object, and only
supporting load of __name__, make the operation generic by delegating the
load to the method object itself. Saves a bit of code size and fixes the
case of attempting to load __name__ on a native method, see issue #4028.
A new option MICROPY_GC_STACK_ENTRY_TYPE is added to select a custom type
instead of size_t for the gc_stack array items. This can be beneficial for
small devices, especially those that are low on memory anyway. If a device
has 1MB or less of heap (and 16-byte GC blocks) then this type can be
uint16_t, saving 128 bytes of RAM.
There was an assumption that all names in a module dict are qstr's.
However, they can be dynamically generated (by assigning to globals()),
and in case of a long name, it won't be a qstr. Handle this situation
properly, including taking care of not creating superfluous qstr's for
names starting with "_" (which aren't imported by "import *").
Taking the address of a local variable is mildly expensive, in code size
and stack usage. So optimise scope_find_or_add_id() to not need to take a
pointer to the "added" variable, and instead take the kind to use for newly
added identifiers.
This ensures that implicit variables are only converted to implicit
closed-over variables (nonlocals) at the very end of the function scope.
If variables are closed-over when first used (read from, as was done prior
to this commit) then this can be incorrect because the variable may be
assigned to later on in the function which means they are just a plain
local, not closed over.
Fixes issue #4272.
Building axtls gives a lot of warnings with -Wall enabled, and explicitly
disabling all of them cannot be done in a way compatible with gcc and
clang, and likely other compilers. So just use -Wno-all to prevent all of
the extra warnings (in addition to the necessary -Wno-unused-parameter,
-Wno-uninitialized, -Wno-sign-compare and -Wno-old-style-definition).
Fixes issue #4182.
Configurable via MICROPY_MODULE_GETATTR, disabled by default. Among other
things __getattr__ for modules can help to build lazy loading / code
unloading at runtime.
Configurable via MICROPY_PY_BUILTINS_STR_COUNT. Default is enabled.
Disabled for bare-arm, minimal, unix-minimal and zephyr ports. Disabling
it saves 408 bytes on x86.
So these constant objects can be loaded by dereferencing the REG_FUN_TABLE
pointer instead of loading immediate values. This reduces the size of
generated native code (when such constants are used), and means that
pointers to these constants are no longer stored in the assembly code.
The maximum index into mp_fun_table is currently less than 1024 and should
stay that way to keep things efficient for all architectures, so there is
no need to handle loading the pointer directly via a literal in this
function.
All architectures now have a dedicated register to hold the pointer to the
native function table mp_fun_table, and so they all need to load this
register at the start of the native function. This commit makes the
loading of this register uniform across architectures by passing the
pointer in the constant table for the native function, and then loading the
register from the constant table. Doing it this way means that the pointer
is not stored in the assembly code, helping to make the code more portable.
Instead of storing the function pointer directly in the assembly code.
This makes the generated code more independent of the runtime (so easier to
relocate the code), and reduces the generated code size.
The esp register is always a fixed distance below ebp, and using esp to
reference locals on the stack frees up the ebp register for general purpose
use (which is important for an architecture with only 8 user registers).
Instead of storing the function pointer directly in the assembly code.
This makes the generated code more independent of the runtime (so easier to
relocate the code), and reduces the generated code size.
The rsp register is always a fixed distance below rbp, and using rsp to
reference locals on the stack frees up the rbp register for general purpose
use.
This commit adds first class support for yield and yield-from in the native
emitter, including send and throw support, and yields enclosed in exception
handlers (which requires pulling down the NLR stack before yielding, then
rebuilding it when resuming).
This has been fully tested and is working on unix x86 and x86-64, and
stm32. Also basic tests have been done with the esp8266 port. Performance
of existing native code is unchanged.
The nlr_buf_t doesn't need to be part of the Python value stack (as it was
before this commit), it's simpler to have it separated as auxiliary state
that lives on the C stack. This will help adding yield support because in
that case the nlr_buf_t and Python value stack live in separate memory
areas (C stack and heap respectively).
Instead of at end of state, n_state - 1. It was originally (way back in
v1.0) put at the end of the state because the VM didn't have a pointer to
the start. But now that the VM takes a mp_code_state_t pointer it does
have a pointer to the start of the state so can put the exception object
there.
This commit saves about 30 bytes of code on all architectures, and, more
importantly, reduces C-stack usage by a couple of words (8 bytes on Thumb2
and 16 bytes on x86-64) for every (non-generator) call of a bytecode
function because fun_bc_call no longer needs to remember the n_state
variable.
This makes these special methods have the same calling behaviour as other
methods in a class instance (mp_convert_member_lookup() is already called
by mp_obj_class_lookup()).
And remove related comment about needing such protection when calling send.
Reasoning for removal is as follows:
- mp_resume is only called by the VM in YIELD_FROM opcode
- if send_value != MP_OBJ_NULL then throw_value == MP_OBJ_NULL
- so if __next__ or send are called then throw_value == MP_OBJ_NULL
- if __next__ or send raise an exception without nlr protection then the
exception will be handled by the global exception handler of the VM
- this handler already has code to handle exceptions raised in YIELD_FROM,
including correct handling of StopIteration
- this handler doesn't handle the case of injection of GeneratorExit, but
this won't be needed because throw_value == MP_OBJ_NULL
Note that it's already possible for mp_resume() to raise an exception
(including StopIteration) from the unprotected call to type->iternext(), so
that's why the VM already has code to handle the case of exceptions coming
out of mp_resume().
This commit reduces code size by a bit, and significantly reduces C stack
usage when using yield-from, from 88 bytes down to 40 for Thumb2, and 152
down to 72 bytes for x86-64 (better than half). (Note that gcc doesn't
seem to tail-call optimise the call from mp_resume() to mp_obj_gen_resume()
so this saving in C stack usage helps all uses of yield-from.)
mp_make_raise_obj must be used to convert a possible exception type to an
instance object, otherwise the VM may raise a non-exception object.
An existing test is adjusted to test this case, with the original test
already moved to generator_throw.py.
This matches how bytecode does it, and matches the signature of
mp_emit_glue_assign_native. Since the native emitter doesn't support
nan-boxing uintptr_t and mp_uint_t are anyway the same bit-width.
After the previous commit this macro is no longer needed by the native
emitter because live heap pointers are no longer stored in generated native
machine code.
This commit changes native code to handle constant objects like bytecode:
instead of storing the pointers inside the native code they are now stored
in a separate constant table (such pointers include objects like bignum,
bytes, and raw code for nested functions). This removes the need for the
GC to scan native code for root pointers, and takes a step towards making
native code independent of the runtime (eg so it can be compiled offline by
mpy-cross).
Note that the changes to the struct scope_t did not increase its size: on a
32-bit architecture it is still 48 bytes, and on a 64-bit architecture it
decreased from 80 to 72 bytes.
Nan and inf (signed and unsigned) are also handled correctly by using
signbit (they were also handled correctly with "val<0", but that didn't
handle -0.0 correctly). A test case is added for this behaviour.
When obj.h is compiled as C++ code, the cl compiler emits a warning about
possibly unsafe mixing of size_t and bool types in the or operation in
MP_OBJ_FUN_MAKE_SIG. Similarly there's an implicit narrowing integer
conversion in runtime.h. This commit fixes this by being explicit.
This is an improvement over previous behavior when str was returned for
both str and bytes input format. This new behaviour is also consistent
with how the % operator works, as well as many other str/bytes methods.
It should be noted that it's not how current versions of CPython work,
where there's a gap in the functionality and bytes.format() is not
supported.
This commit adds the math.factorial function in two variants:
- squared difference, which is faster than the naive version, relatively
compact, and non-recursive;
- a mildly optimised recursive version, faster than the above one.
There are some more optimisations that could be done, but they tend to take
more code, and more storage space. The recursive version seems like a
sensible compromise.
The new function is disabled by default, and uses the non-optimised version
by default if it is enabled. The options are MICROPY_PY_MATH_FACTORIAL
and MICROPY_OPT_MATH_FACTORIAL.
This patches avoids multiplying with negative powers-of-10 when parsing
floating-point values, when those powers-of-10 can be exactly represented
as a positive power. When represented as a positive power and used to
divide, the resulting float will not have any rounding errors.
The issue is that mp_parse_num_decimal will sometimes not give the closest
floating representation of the input string. Eg for "0.3", which can't be
represented exactly in floating point, mp_parse_num_decimal gives a
slightly high (by 1LSB) result. This is because it computes the answer as
3 * 0.1, and since 0.1 also can't be represented exactly, multiplying by 3
multiplies up the rounding error in the 0.1. Computing it as 3 / 10, as
now done by the change in this commit, gives an answer which is as close to
the true value of "0.3" as possible.
This commit implements PEP479 which disallows raising StopIteration inside
a generator to signal that it should be finished. Instead, the generator
should simply return when it is complete.
See https://www.python.org/dev/peps/pep-0479/ for details.
In 0e80f345f8 the inplace operations __iadd__
and __isub__ were made unconditionally available, so the comment about this
section is changed to reflect that.
Loading a pointer by indexing into the native function table mp_fun_table,
rather than loading an immediate value (via a PC-relative load), uses less
code space.
This commit makes viper functions have the same signature as native
functions, at the level of the emitter/assembler. This means that viper
functions can now be wrapped in the same uPy object as native functions.
Viper functions are now responsible for parsing their arguments (before it
was done by the runtime), and this makes calling them more efficient (in
most cases) because the viper entry code can be custom generated to suit
the signature of the function.
This change also opens the way forward for viper functions to take
arbitrary numbers of arguments, and for them to handle globals correctly,
among other things.
Now that the compiler can store the results of the viper types in the
scope, the viper parameter annotation compilation stage can be merged with
the normal parameter compilation stage.
With 5 arguments to mp_arg_check_num(), some architectures need to pass
values on the stack. So compressing n_args_min, n_args_max, takes_kw into
a single word and passing only 3 arguments makes the call more efficient,
because almost all calls to this function pass in constant values. Code
size is also reduced by a decent amount:
bare-arm: -116
minimal x86: -64
unix x64: -256
unix nanbox: -112
stm32: -324
cc3200: -192
esp8266: -192
esp32: -144
Prior to this commit a function compiled with the native decorator
@micropython.native would not work correctly when accessing global
variables, because the globals dict was not being set upon function entry.
This commit fixes this problem by, upon function entry, setting as the
current globals dict the globals dict context the function was defined
within, as per normal Python semantics, and as bytecode does. Upon
function exit the original globals dict is restored.
In order to restore the globals dict when an exception is raised the native
function must guard its internals with an nlr_push/nlr_pop pair. Because
this push/pop is relatively expensive, in both C stack usage for the
nlr_buf_t and CPU execution time, the implementation here optimises things
as much as possible. First, the compiler keeps track of whether a function
even needs to access global variables. Using this information the native
emitter then generates three different kinds of code:
1. no globals used, no exception handlers: no nlr handling code and no
setting of the globals dict.
2. globals used, no exception handlers: an nlr_buf_t is allocated on the
C stack but it is not used if the globals dict is unchanged, saving
execution time because nlr_push/nlr_pop don't need to run.
3. function has exception handlers, may use globals: an nlr_buf_t is
allocated and nlr_push/nlr_pop are always called.
In the end, native functions that don't access globals and don't have
exception handlers will run more efficiently than those that do.
Fixes issue #1573.
If bytearray is constructed from str, a second argument of encoding is
required (in CPython), and third arg of Unicode error handling is allowed,
e.g.:
bytearray("str", "utf-8", "strict")
This is similar to bytes:
bytes("str", "utf-8", "strict")
This patch just allows to pass 2nd/3rd arguments to bytearray, but
doesn't try to validate them to not impact code size. (This is also
similar to how bytes constructor is handled, though it does a bit
more validation, e.g. check that in case of str arg, encoding argument
is passed.)
This removes the need for a separate axtls build stage, and builds all
axtls object files along with other code. This simplifies and cleans up
the build process, automatically builds axtls when needed, and puts the
axtls object files in the correct $(BUILD) location.
The MicroPython axtls configuration file is provided in
extmod/axtls-include/config.h
This patch adds full support for unwinding jumps to the native emitter.
This means that return/break/continue can be used in try-except,
try-finally and with statements. For code that doesn't use unwinding jumps
there is almost no overhead added to the generated code.