Enabled via MICROPY_PY_URE_DEBUG, disabled by default (but enabled on unix
coverage build). This is a rarely used feature that costs a lot of code
(500-800 bytes flash). Debugging of regular expressions can be done
offline with other tools.
Recent versions of gcc perform optimisations which can lead to the
following code from the MP_NLR_JUMP_HEAD macro being omitted:
top->ret_val = val; \
MP_NLR_RESTORE_PYSTACK(top); \
*_top_ptr = top->prev; \
This is noticeable (at least) in the unix coverage on x86-64 built with gcc
9.1.0. This is because the nlr_jump function is marked as no-return, so
gcc deduces that the above code has no effect.
Adding MP_UNREACHABLE tells the compiler that the asm code may branch
elsewhere, and so it cannot optimise away the code.
As per PEP 485, this function appeared in for Python 3.5. Configured via
MICROPY_PY_MATH_ISCLOSE which is disabled by default, but enabled for the
ports which already have MICROPY_PY_MATH_SPECIAL_FUNCTIONS enabled.
Prior to this patch the amount of free space in an array (including
bytearray) was not being maintained correctly for the case of slice
assignment which changed the size of the array. Under certain cases (as
encoded in the new test) it was possible that the array could grow beyond
its allocated memory block and corrupt the heap.
Fixes issue #4127.
This patch implements a new sys.atexit function which registers a function
that is later executed when the main script ends. It is configurable via
MICROPY_PY_SYS_ATEXIT, disabled by default.
This is not compliant with CPython, rather it can be used to implement a
CPython compatible "atexit" module if desired (similar to how
sys.print_exception can be used to implement functionality of the
"traceback" module).
This patch adds a simple but powerful hook into the import system, in a
CPython compatible way, by allowing to override builtins.__import__.
This does introduce some overhead to all imports but it's minor:
- the dict lookup of __import__ is bypassed if there are no modifications
to the builtins module (which is the case at start up);
- imports are not performance critical, usually done just at the start of a
script;
- compared to how much work is done in an import, looking up a value in a
dict is a relatively small additional piece of work.
JSON requires that keys of objects be strings. CPython will therefore
automatically quote simple types (NoneType, bool, int, float) when they are
used directly as keys in JSON output. To prevent subtle bugs and emit
compliant JSON, MicroPython should at least test for such keys so they
aren't silently let through. Then doing the actual quoting is a similar
cost to raising an exception, so that's what is implemented by this patch.
Fixes issue #4790.
Behaviour was changed from stack to queue in
8977c7eb58, and this updates variable names
to match. Also updates other references (docs, error messages).
__clear_cache causes a compile error when using clang. Instead use
__builtin___clear_cache which is available under both gcc and clang.
Also replace tabs with spaces in this section of code (introduced by a
previous commit).
This fixes compiling for older architectures (e.g. armv5tej).
According to [1], the limit of R0-R7 for the STR and LDR instructions is
tied to the Thumb instruction set and not any specific processor
architectures.
[1]: http://www.keil.com/support/man/docs/armasm/armasm_dom1361289906890.htm
With both MICROPY_PERSISTENT_CODE_SAVE and MICROPY_PERSISTENT_CODE_LOAD
enabled the code fails to compile, due to undeclared 'n_obj'. If
MICROPY_EMIT_NATIVE is disabled there are more errors due to the use of
undefined fields in mp_raw_code_t.
This patch fixes such compilation by avoiding undefined fields.
MICROPY_EMIT_NATIVE was changed to MICROPY_EMIT_MACHINE_CODE in this file
to match the mp_raw_code_t definition.
These s16-s21 registers are used by gcc so need to be saved. Future
versions of gcc (beyond v9.1.0), or other compilers, may eventually need
additional registers saved/restored.
See issue #4844.
Instead of converting to a small-int at runtime this can be done at compile
time, then we only have a simple comparison during runtime. This reduces
code size on some ports (e.g -4 on qemu-arm, -52 on unix nanbox), and for
others at least doesn't increase code size.
mpy-cross uses MICROPY_DYNAMIC_COMPILER and MICROPY_EMIT_NATIVE but does
not actually need to execute native functions, and does not need
mp_fun_table. This commit makes it so mp_fun_table and all its entries are
not built when MICROPY_DYNAMIC_COMPILER is enabled, significantly reducing
the size of the mpy-cross executable and allowing it to be built on more
machines/OS's.
This ; make Windows compilation fail with GNU makefile 4.2.1. It was added
in 0dc85c9f86 as part of a shell if-
statement, but this if-statement was subsequently removed in
23a693ec2d so the semicolon is not needed.
The variable $(TOUCH) is initialized with the "touch" value in mkenv.mk
like for the other command line tools (rm, echo, cp, mkdir etc). With
this, for example, Windows users can specify the path of touch.exe.
The variable $(CAT) is initialised with the "cat" value in mkenv.mk like
for the other command line tools (rm, echo, cp, mkdir etc). With this,
for example, Windows users can specify the path of cat.exe.
Reuse the implementation for bytes since it works the same way regardless
of the underlying type. This method gets added for CPython compatibility
of bytearray, but to keep the code simple and small array.array now also
has a working decode method, which is non-standard but doesn't hurt.
This allows figuring out the number of bytes in the memoryview object as
len(memview) * memview.itemsize.
The feature is enabled via MICROPY_PY_BUILTINS_MEMORYVIEW_ITEMSIZE and is
disabled by default.
Prior to this commit, building the unix port with `DEBUG=1` and
`-finstrument-functions` the compilation would fail with an error like
"control reaches end of non-void function". This change fixes this by
removing the problematic "if (0)" branches. Not all branches affect
compilation, but they are all removed for consistency.
With this change, @micropython.asm_thumb functions will work on standard
ARM processors (that are in ARM state by default), in scripts and
precompiled .mpy files.
Addresses issue #4675.
This system makes it a lot easier to include external libraries as static,
native modules in MicroPython. Simply pass USER_C_MODULES (like
FROZEN_MPY_DIR) as a make parameter.
During make, makemoduledefs.py parses the current builds c files for
MP_REGISTER_MODULE(module_name, obj_module, enabled_define)
These are used to generate a header with the required entries for
"mp_rom_map_elem_t mp_builtin_module_table[]" in py/objmodule.c
This commit adds support for saving and loading .mpy files that contain
native code (native, viper and inline-asm). A lot of the ground work was
already done for this in the form of removing pointers from generated
native code. The changes here are mainly to link in qstr values to the
native code, and change the format of .mpy files to contain native code
blocks (possibly mixed with bytecode).
A top-level summary:
- @micropython.native, @micropython.viper and @micropython.asm_thumb/
asm_xtensa are now allowed in .py files when compiling to .mpy, and they
work transparently to the user.
- Entire .py files can be compiled to native via mpy-cross -X emit=native
and for the most part the generated .mpy files should work the same as
their bytecode version.
- The .mpy file format is changed to 1) specify in the header if the file
contains native code and if so the architecture (eg x86, ARMV7M, Xtensa);
2) for each function block the kind of code is specified (bytecode,
native, viper, asm).
- When native code is loaded from a .mpy file the native code must be
modified (in place) to link qstr values in, just like bytecode (see
py/persistentcode.c:arch_link_qstr() function).
In addition, this now defines a public, native ABI for dynamically loadable
native code generated by other languages, like C.
The new compile-time option is MICROPY_DEBUG_MP_OBJ_SENTINELS, disabled by
default. This is to allow finer control of whether this debugging feature
is enabled or not (because, for example, this setting must be the same for
mpy-cross and the MicroPython main code when using native code generation).
When encoded in the mpy file, if qstr <= QSTR_LAST_STATIC then store two
bytes: 0, static_qstr_id. Otherwise encode the qstr as usual (either with
string data or a reference into the qstr window).
Reduces mpy file size by about 5%.
Instead of emitting two bytes in the bytecode for where the linked qstr
should be written to, it is now replaced by the actual qstr data, or a
reference into the qstr window.
Reduces mpy file size by about 10%.
This is an implementation of a sliding qstr window used to reduce the
number of qstrs stored in a .mpy file. The window size is configured to 32
entries which takes a fixed 64 bytes (16-bits each) on the C stack when
loading/saving a .mpy file. It allows to remember the most recent 32 qstrs
so they don't need to be stored again in the .mpy file. The qstr window
uses a simple least-recently-used mechanism to discard the least recently
used qstr when the window overflows (similar to dictionary compression).
This scheme only needs a single pass to save/load the .mpy file.
Reduces mpy file size by about 25% with a window size of 32.
POP_BLOCK and POP_EXCEPT are now the same, and are always followed by a
JUMP. So this optimisation reduces code size, and RAM usage of bytecode by
two bytes for each try-except handler.
This patch fixes a bug in the VM when breaking within a try-finally. The
bug has to do with executing a break within the finally block of a
try-finally statement. For example:
def f():
for x in (1,):
print('a', x)
try:
raise Exception
finally:
print(1)
break
print('b', x)
f()
Currently in uPy the above code will print:
a 1
1
1
segmentation fault (core dumped) micropython
Not only is there a seg fault, but the "1" in the finally block is printed
twice. This is because when the VM executes a finally block it doesn't
really know if that block was executed due to a fall-through of the try (no
exception raised), or because an exception is active. In particular, for
nested finallys the VM has no idea which of the nested ones have active
exceptions and which are just fall-throughs. So when a break (or continue)
is executed it tries to unwind all of the finallys, when in fact only some
may be active.
It's questionable whether break (or return or continue) should be allowed
within a finally block, because they implicitly swallow any active
exception, but nevertheless it's allowed by CPython (although almost never
used in the standard library). And uPy should at least not crash in such a
case.
The solution here relies on the fact that exception and finally handlers
always appear in the bytecode after the try body.
Note: there was a similar bug with a return in a finally block, but that
was previously fixed in b735208403
Also, to make it possible for ports to provide their own lwipopts.h, the
default include directory of extmod/lwip-include is no longer added and
instead a port should now make sure the correct include directory is
included in the list (can still use extmod/lwip-include).
This optimisation eliminates the need to create a temporary normal dict.
The optimisation is enabled via MICROPY_COMP_CONST_LITERAL which is enabled
by default (although only has an effect if OrderdDict is enabled).
Thanks to @pfalcon for the initial idea and implementation.
All exceptions that unwind through the async-with must be caught and
BaseException is the top-level class, which includes Exception and others.
Fixes issue #4552.
As mentioned in #4450, `websocket` was experimental with a single intended
user, `webrepl`. Therefore, we'll make this change without a weak
link `websocket` -> `uwebsocket`.
This change makes it so that python3 is required by default to build
MicroPython. Python 2 can be used by specifying make PYTHON=python2.
This comes about due to a recent-ish change to PEP 394 that makes the
python command more optional than before (even with Python 2 installed);
see cd59ec03c8 (diff-1d22f7bd72cbc900670f058b1107d426)
Since the command python is no longer required to be provided by a
distribution we need to use either python2 or python3 as commands. And
python3 seems the obvious choice.
These macros could in principle be (inline) functions so it makes sense to
have them lower case, to match the other C API functions.
The remaining macros that are upper case are:
- MP_OBJ_TO_PTR, MP_OBJ_FROM_PTR
- MP_OBJ_NEW_SMALL_INT, MP_OBJ_SMALL_INT_VALUE
- MP_OBJ_NEW_QSTR, MP_OBJ_QSTR_VALUE
- MP_OBJ_FUN_MAKE_SIG
- MP_DECLARE_CONST_xxx
- MP_DEFINE_CONST_xxx
These must remain macros because they are used when defining const data (at
least, MP_OBJ_NEW_SMALL_INT is so it makes sense to have
MP_OBJ_SMALL_INT_VALUE also a macro).
For those macros that have been made lower case, compatibility macros are
provided for the old names so that users do not need to change their code
immediately.
Python defines warnings as belonging to categories, where category is a
warning type (descending from exception type). This is useful, as e.g.
allows to disable warnings selectively and provide user-defined warning
types. So, implement this in MicroPython, except that categories are
represented just with strings. However, enough hooks are left to implement
categories differently per-port (e.g. as types), without need to patch each
and every usage.
If MICROPY_PERSISTENT_CODE_LOAD or MICROPY_ENABLE_COMPILER are enabled then
code gets enabled that calls file reading functions which may be disabled
if no readers have been implemented.
To fix this, introduce a MICROPY_HAS_FILE_READER variable, which is
automatically set if MICROPY_READER_POSIX or MICROPY_READER_VFS is set but
can also be manually set if a custom reader is being implemented. Then
disable the file reading calls if this is not set.
For architectures where size_t is less than 32 bits (eg 16 bits) the args
must be casted to uint32_t so the left shift will work. For architectures
where size_t is greater than 32 bits (eg 64 bits) this new casting will not
lose any bits because the end result must anyway fit in a uint32_t.
Changes to the layout of the bytecode header meant that this debug code was
no longer compiling. This is now fixed and a new compile-time option is
introduced, MICROPY_DEBUG_VM_STACK_OVERFLOW, to turn on this feature (which
is disabled by default). This option is needed because more than one file
needs to cooperate to make this check work.
It's more robust to have the version defined statically in a header file,
rather than dynamically generating it via git using a git tag. In case
git doesn't exist, or a different source control tool is used, it's
important to still have the uPy version number available.
The older "bool has_finaliser" gets recast as GC_ALLOC_FLAG_HAS_FINALISER=1
so this is a backwards compatible change to the signature. Since bool gets
implicitly converted to 1 this patch doesn't include conversion of all
calls.
Both mp_type_array and mp_type_memoryview use the same object structure,
mp_obj_array_t, but for the case of memoryview, some fields, e.g. "free",
have different meaning. As the "free" field is also a bitfield, assume
that (anonymous) union can't be used here (for the concerns of possible
compatibility issues with wide array of toolchains), and just add a field
alias using a #define. As it's a define, it should be a selective
identifier, so use verbose "memview_offset" to avoid any clashes.
All 4 opcodes that can have caching bytes also have qstrs, so the test for
them must go in the qstr part of the code. The reason this incorrect
calculation of the opcode size did not lead to a bug is because the caching
byte is at the end of the opcode (byte, qstr, qstr, cache) and is always
0x00 when saving/loading, so was just treated as a single byte no-op
opcode. Hence these opcodes were being saved/loaded/decoded correctly.
Thanks to @malinah for finding the problem and providing the initial patch.
mp_obj_new_exception_msg() assumes that the message passed to it is in ROM
and so can use its data directly to create the string object for the
argument of the exception, saving RAM. At the same time, this approach
also makes sure that there is no attempt to format the message with printf,
which could lead to faults if the message contained % characters.
Fixes issue #3004.
SHORT, INT, LONG, LONGLONG, and unsigned (U*) variants are being defined.
This is done at compile using GCC-style predefined macros like
__SIZEOF_INT__. If the compiler doesn't have such defines, no such types
will be defined.
Instead of assuming that the method is a bytecode object, and only
supporting load of __name__, make the operation generic by delegating the
load to the method object itself. Saves a bit of code size and fixes the
case of attempting to load __name__ on a native method, see issue #4028.
A new option MICROPY_GC_STACK_ENTRY_TYPE is added to select a custom type
instead of size_t for the gc_stack array items. This can be beneficial for
small devices, especially those that are low on memory anyway. If a device
has 1MB or less of heap (and 16-byte GC blocks) then this type can be
uint16_t, saving 128 bytes of RAM.
There was an assumption that all names in a module dict are qstr's.
However, they can be dynamically generated (by assigning to globals()),
and in case of a long name, it won't be a qstr. Handle this situation
properly, including taking care of not creating superfluous qstr's for
names starting with "_" (which aren't imported by "import *").
Taking the address of a local variable is mildly expensive, in code size
and stack usage. So optimise scope_find_or_add_id() to not need to take a
pointer to the "added" variable, and instead take the kind to use for newly
added identifiers.
This ensures that implicit variables are only converted to implicit
closed-over variables (nonlocals) at the very end of the function scope.
If variables are closed-over when first used (read from, as was done prior
to this commit) then this can be incorrect because the variable may be
assigned to later on in the function which means they are just a plain
local, not closed over.
Fixes issue #4272.
Building axtls gives a lot of warnings with -Wall enabled, and explicitly
disabling all of them cannot be done in a way compatible with gcc and
clang, and likely other compilers. So just use -Wno-all to prevent all of
the extra warnings (in addition to the necessary -Wno-unused-parameter,
-Wno-uninitialized, -Wno-sign-compare and -Wno-old-style-definition).
Fixes issue #4182.
Configurable via MICROPY_MODULE_GETATTR, disabled by default. Among other
things __getattr__ for modules can help to build lazy loading / code
unloading at runtime.
Configurable via MICROPY_PY_BUILTINS_STR_COUNT. Default is enabled.
Disabled for bare-arm, minimal, unix-minimal and zephyr ports. Disabling
it saves 408 bytes on x86.
So these constant objects can be loaded by dereferencing the REG_FUN_TABLE
pointer instead of loading immediate values. This reduces the size of
generated native code (when such constants are used), and means that
pointers to these constants are no longer stored in the assembly code.
The maximum index into mp_fun_table is currently less than 1024 and should
stay that way to keep things efficient for all architectures, so there is
no need to handle loading the pointer directly via a literal in this
function.
All architectures now have a dedicated register to hold the pointer to the
native function table mp_fun_table, and so they all need to load this
register at the start of the native function. This commit makes the
loading of this register uniform across architectures by passing the
pointer in the constant table for the native function, and then loading the
register from the constant table. Doing it this way means that the pointer
is not stored in the assembly code, helping to make the code more portable.
Instead of storing the function pointer directly in the assembly code.
This makes the generated code more independent of the runtime (so easier to
relocate the code), and reduces the generated code size.
The esp register is always a fixed distance below ebp, and using esp to
reference locals on the stack frees up the ebp register for general purpose
use (which is important for an architecture with only 8 user registers).
Instead of storing the function pointer directly in the assembly code.
This makes the generated code more independent of the runtime (so easier to
relocate the code), and reduces the generated code size.
The rsp register is always a fixed distance below rbp, and using rsp to
reference locals on the stack frees up the rbp register for general purpose
use.
This commit adds first class support for yield and yield-from in the native
emitter, including send and throw support, and yields enclosed in exception
handlers (which requires pulling down the NLR stack before yielding, then
rebuilding it when resuming).
This has been fully tested and is working on unix x86 and x86-64, and
stm32. Also basic tests have been done with the esp8266 port. Performance
of existing native code is unchanged.
The nlr_buf_t doesn't need to be part of the Python value stack (as it was
before this commit), it's simpler to have it separated as auxiliary state
that lives on the C stack. This will help adding yield support because in
that case the nlr_buf_t and Python value stack live in separate memory
areas (C stack and heap respectively).
Instead of at end of state, n_state - 1. It was originally (way back in
v1.0) put at the end of the state because the VM didn't have a pointer to
the start. But now that the VM takes a mp_code_state_t pointer it does
have a pointer to the start of the state so can put the exception object
there.
This commit saves about 30 bytes of code on all architectures, and, more
importantly, reduces C-stack usage by a couple of words (8 bytes on Thumb2
and 16 bytes on x86-64) for every (non-generator) call of a bytecode
function because fun_bc_call no longer needs to remember the n_state
variable.
This makes these special methods have the same calling behaviour as other
methods in a class instance (mp_convert_member_lookup() is already called
by mp_obj_class_lookup()).
And remove related comment about needing such protection when calling send.
Reasoning for removal is as follows:
- mp_resume is only called by the VM in YIELD_FROM opcode
- if send_value != MP_OBJ_NULL then throw_value == MP_OBJ_NULL
- so if __next__ or send are called then throw_value == MP_OBJ_NULL
- if __next__ or send raise an exception without nlr protection then the
exception will be handled by the global exception handler of the VM
- this handler already has code to handle exceptions raised in YIELD_FROM,
including correct handling of StopIteration
- this handler doesn't handle the case of injection of GeneratorExit, but
this won't be needed because throw_value == MP_OBJ_NULL
Note that it's already possible for mp_resume() to raise an exception
(including StopIteration) from the unprotected call to type->iternext(), so
that's why the VM already has code to handle the case of exceptions coming
out of mp_resume().
This commit reduces code size by a bit, and significantly reduces C stack
usage when using yield-from, from 88 bytes down to 40 for Thumb2, and 152
down to 72 bytes for x86-64 (better than half). (Note that gcc doesn't
seem to tail-call optimise the call from mp_resume() to mp_obj_gen_resume()
so this saving in C stack usage helps all uses of yield-from.)
mp_make_raise_obj must be used to convert a possible exception type to an
instance object, otherwise the VM may raise a non-exception object.
An existing test is adjusted to test this case, with the original test
already moved to generator_throw.py.
This matches how bytecode does it, and matches the signature of
mp_emit_glue_assign_native. Since the native emitter doesn't support
nan-boxing uintptr_t and mp_uint_t are anyway the same bit-width.
After the previous commit this macro is no longer needed by the native
emitter because live heap pointers are no longer stored in generated native
machine code.
This commit changes native code to handle constant objects like bytecode:
instead of storing the pointers inside the native code they are now stored
in a separate constant table (such pointers include objects like bignum,
bytes, and raw code for nested functions). This removes the need for the
GC to scan native code for root pointers, and takes a step towards making
native code independent of the runtime (eg so it can be compiled offline by
mpy-cross).
Note that the changes to the struct scope_t did not increase its size: on a
32-bit architecture it is still 48 bytes, and on a 64-bit architecture it
decreased from 80 to 72 bytes.
Nan and inf (signed and unsigned) are also handled correctly by using
signbit (they were also handled correctly with "val<0", but that didn't
handle -0.0 correctly). A test case is added for this behaviour.
When obj.h is compiled as C++ code, the cl compiler emits a warning about
possibly unsafe mixing of size_t and bool types in the or operation in
MP_OBJ_FUN_MAKE_SIG. Similarly there's an implicit narrowing integer
conversion in runtime.h. This commit fixes this by being explicit.
This is an improvement over previous behavior when str was returned for
both str and bytes input format. This new behaviour is also consistent
with how the % operator works, as well as many other str/bytes methods.
It should be noted that it's not how current versions of CPython work,
where there's a gap in the functionality and bytes.format() is not
supported.
This commit adds the math.factorial function in two variants:
- squared difference, which is faster than the naive version, relatively
compact, and non-recursive;
- a mildly optimised recursive version, faster than the above one.
There are some more optimisations that could be done, but they tend to take
more code, and more storage space. The recursive version seems like a
sensible compromise.
The new function is disabled by default, and uses the non-optimised version
by default if it is enabled. The options are MICROPY_PY_MATH_FACTORIAL
and MICROPY_OPT_MATH_FACTORIAL.
This patches avoids multiplying with negative powers-of-10 when parsing
floating-point values, when those powers-of-10 can be exactly represented
as a positive power. When represented as a positive power and used to
divide, the resulting float will not have any rounding errors.
The issue is that mp_parse_num_decimal will sometimes not give the closest
floating representation of the input string. Eg for "0.3", which can't be
represented exactly in floating point, mp_parse_num_decimal gives a
slightly high (by 1LSB) result. This is because it computes the answer as
3 * 0.1, and since 0.1 also can't be represented exactly, multiplying by 3
multiplies up the rounding error in the 0.1. Computing it as 3 / 10, as
now done by the change in this commit, gives an answer which is as close to
the true value of "0.3" as possible.
This commit implements PEP479 which disallows raising StopIteration inside
a generator to signal that it should be finished. Instead, the generator
should simply return when it is complete.
See https://www.python.org/dev/peps/pep-0479/ for details.
In 0e80f345f8 the inplace operations __iadd__
and __isub__ were made unconditionally available, so the comment about this
section is changed to reflect that.
Loading a pointer by indexing into the native function table mp_fun_table,
rather than loading an immediate value (via a PC-relative load), uses less
code space.
This commit makes viper functions have the same signature as native
functions, at the level of the emitter/assembler. This means that viper
functions can now be wrapped in the same uPy object as native functions.
Viper functions are now responsible for parsing their arguments (before it
was done by the runtime), and this makes calling them more efficient (in
most cases) because the viper entry code can be custom generated to suit
the signature of the function.
This change also opens the way forward for viper functions to take
arbitrary numbers of arguments, and for them to handle globals correctly,
among other things.
Now that the compiler can store the results of the viper types in the
scope, the viper parameter annotation compilation stage can be merged with
the normal parameter compilation stage.
With 5 arguments to mp_arg_check_num(), some architectures need to pass
values on the stack. So compressing n_args_min, n_args_max, takes_kw into
a single word and passing only 3 arguments makes the call more efficient,
because almost all calls to this function pass in constant values. Code
size is also reduced by a decent amount:
bare-arm: -116
minimal x86: -64
unix x64: -256
unix nanbox: -112
stm32: -324
cc3200: -192
esp8266: -192
esp32: -144
Prior to this commit a function compiled with the native decorator
@micropython.native would not work correctly when accessing global
variables, because the globals dict was not being set upon function entry.
This commit fixes this problem by, upon function entry, setting as the
current globals dict the globals dict context the function was defined
within, as per normal Python semantics, and as bytecode does. Upon
function exit the original globals dict is restored.
In order to restore the globals dict when an exception is raised the native
function must guard its internals with an nlr_push/nlr_pop pair. Because
this push/pop is relatively expensive, in both C stack usage for the
nlr_buf_t and CPU execution time, the implementation here optimises things
as much as possible. First, the compiler keeps track of whether a function
even needs to access global variables. Using this information the native
emitter then generates three different kinds of code:
1. no globals used, no exception handlers: no nlr handling code and no
setting of the globals dict.
2. globals used, no exception handlers: an nlr_buf_t is allocated on the
C stack but it is not used if the globals dict is unchanged, saving
execution time because nlr_push/nlr_pop don't need to run.
3. function has exception handlers, may use globals: an nlr_buf_t is
allocated and nlr_push/nlr_pop are always called.
In the end, native functions that don't access globals and don't have
exception handlers will run more efficiently than those that do.
Fixes issue #1573.
If bytearray is constructed from str, a second argument of encoding is
required (in CPython), and third arg of Unicode error handling is allowed,
e.g.:
bytearray("str", "utf-8", "strict")
This is similar to bytes:
bytes("str", "utf-8", "strict")
This patch just allows to pass 2nd/3rd arguments to bytearray, but
doesn't try to validate them to not impact code size. (This is also
similar to how bytes constructor is handled, though it does a bit
more validation, e.g. check that in case of str arg, encoding argument
is passed.)
This removes the need for a separate axtls build stage, and builds all
axtls object files along with other code. This simplifies and cleans up
the build process, automatically builds axtls when needed, and puts the
axtls object files in the correct $(BUILD) location.
The MicroPython axtls configuration file is provided in
extmod/axtls-include/config.h
This patch adds full support for unwinding jumps to the native emitter.
This means that return/break/continue can be used in try-except,
try-finally and with statements. For code that doesn't use unwinding jumps
there is almost no overhead added to the generated code.
The native emitter keeps the current exception in a slot in its C stack
(instead of on its Python value stack), so when it catches an exception it
must explicitly clear that slot so the same exception is not reraised later
on.
Back in 8047340d75 basic support was added in
the VM to handle return statements within a finally block. But it didn't
cover all cases, in particular when some finally's were active and others
inactive when the "return" was executed.
This patch adds further support for return-within-finally by correctly
managing the currently_in_except_block flag, and should fix all cases. The
main point is that finally handlers remain on the exception stack even if
they are active (currently being executed), and the unwind return code
should only execute those finally's which are inactive.
New tests are added for the cases which now pass.
Prior to this patch, native code would use a full nlr_buf_t for each
exception handler (try-except, try-finally, with). For nested exception
handlers this would use a lot of C stack and be rather inefficient.
This patch changes how exceptions are handled in native code by setting up
only a single nlr_buf_t context for the entire function, and then manages a
state machine (using the PC) to work out which exception handler to run
when an exception is raised by an nlr_jump. This keeps the C stack usage
at a constant level regardless of the depth of Python exception blocks.
The patch also fixes an existing bug when local variables are written to
within an exception handler, then their value was incorrectly restored if
an exception was raised (since the nlr_jump would restore register values,
back to the point of the nlr_push).
And it also gets nested try-finally+with working with the viper emitter.
Broadly speaking, efficiency of executing native code that doesn't use
any exception blocks is unchanged, and emitted code size is only slightly
increased for such function. C stack usage of all native functions is
either equal or less than before. Emitted code size for native functions
that use exception blocks is increased by roughly 10% (due in part to
fixing of above-mentioned bugs).
But, most importantly, this patch allows to implement more Python features
in native code, like unwind jumps and yielding from within nested exception
blocks.
These POSIX wrappers are assumed to be passed a concrete stream object so
it is more efficient (eg on nan-boxing builds) to pass in the pointer
rather than mp_obj_t, because then the users of these functions only need
to store a void* (and mp_obj_t may be wider than a pointer). And things
would be further improved if the stream protocol functions eventually took
a pointer as their first argument (instead of an mp_obj_t).
This patch is a step to getting ussl/axtls compiling on nan-boxing builds.
See issue #3085.
Otherwise there is the possibility that n_free starts out non-zero from the
previous iteration, which may have found a few (but not enough) free blocks
at the end of the heap. If this is the case, and if the very first blocks
that are scanned the second time around (starting at
gc_last_free_atb_index) are found to give enough memory (including the
blocks at the end of the heap from the previous iteration that left n_free
non-zero) then memory will be allocated starting before the location that
gc_last_free_atb_index points to, most likely leading to corruption.
This serious bug did not manifest itself in the past because a gc_collect
always resets gc_last_free_atb_index to point to the start of the GC heap,
and the first block there is almost always allocated to a long-lived
object (eg entries from sys.path, or mounted filesystem objects), which
means that n_free would be reset at the start of the search loop.
But with threading enabled with the GIL disabled it is possible to trigger
the bug via the following sequence of events:
1. Thread A runs gc_alloc, fails to find enough memory, and has a non-zero
n_free at the end of the search.
2. Thread A calls gc_collect and frees a bunch of blocks on the GC heap.
3. Just after gc_collect finishes in thread A, thread B takes gc_mutex and
does an allocation, moving gc_last_free_atb_index to point to the
interior of the heap, to a place where there is most likely a run of
available blocks.
4. Thread A regains gc_mutex and does its second search for free memory,
starting with a non-zero n_free. Since it's likely that the first block
it searches is available it will allocate memory which overlaps with the
memory before gc_last_free_atb_index.
Without this patch, on 64-bit architectures the "1 << (small_int_bits - 1)"
is computed using only 32-bit values (since small_int_bits is a uint8_t)
and so will overflow (and give the wrong result) if small_int_bits is
larger than 32.
There is no need to have three copies of the exception object on the top of
the native value stack. Instead, the values on the stack should be the
first two items in an nlr_buf_t: the prev pointer and the ret_val pointer.
This is all that is needed and is what the rest of the native emitter
expects is on the stack.
This patch is essentially an optimisation. Behaviour is unchanged,
although the stack layout for native exception handling now makes more
sense.
A native function allocates space on its C stack for mp_code_state_t,
followed by its Python stack, then its locals. This patch makes sure that
the native function actually starts at the start of its Python stack,
rather than at the start of mp_code_state_t (which didn't lead to any
issues so far because the mp_code_state_t is unused after the native
function sets itself up).
On x86 archs (both 32 and 64 bit) a bool return value only sets the 8-bit
al register, and the higher bits of the ax register have an undefined
value. When testing the return value of such cases it is required to just
test al for zero/non-zero. On the other hand, checking for truth or
zero/non-zero on an integer return value requires checking all bits of the
register. These two cases must be distinguished and handled correctly in
generated native code. This patch makes sure of this.
For other supported native archs (ARM, Thumb2, Xtensa) there is no such
distinction and this patch does not change anything for them.
DEBUG_printf and MICROPY_DEBUG_PRINTER is now used instead of normal
printf, and a fault is fixed in mp_obj_class_lookup with debugging enabled;
see issue #3999. Debugging can now be enabled on all ports including when
nan-boxing is used.
This patch in effect renames MICROPY_DEBUG_PRINTER_DEST to
MICROPY_DEBUG_PRINTER, moving its default definition from
lib/utils/printf.c to py/mpconfig.h to make it official and documented, and
makes this macro a pointer rather than the actual mp_print_t struct. This
is done to get consistency with MICROPY_ERROR_PRINTER, and provide this
macro for use outside just lib/utils/printf.c.
Ports are updated to use the new macro name.
This patch makes the Thumb-2 native emitter use wide ldr instructions to
call into the runtime, when the index into the native glue function table
is 32 or greater. This reduces the generated assembler code from 10 bytes
to 6 bytes, saving RAM and making native code run about 0.8% faster.
This error message did not consume all of its variable args, a bug
introduced long ago in baf6f14deb. By fixing
it to use %s (instead of keeping the string as-is and deleting the last
arg) the same error message string is now reused three times in this format
function and gives a code size reduction of around 130 bytes. It also now
gives a better error message when a non-string is passed in as an argument
to format, eg '{:d}'.format([]).
There's no need to call mp_obj_new_int() which will just fail the check for
small int and call mp_obj_new_int_from_ll() anyway.
Thanks to @Jongy for prompting this change.
In non-debug mode MP_OBJ_STOP_ITERATION is zero and comparing something to
zero can be done more efficiently in assembler than comparing to a non-zero
value.
With the recent change b488a4a848, a
generating function now has the same layout in memory as a normal bytecode
function, and so can reuse the latter's attribute accessor code to
implement __name__.
Because this function is simple it saves code size to have it inlined.
Being an auxiliary helper function (and only used in the py/ core) the
argument should always be an mp_obj_module_t*, so there's no need for the
assert (and having it would require including assert.h in obj.h).
It's a very simple function and saves code, and improves efficiency, by
being inline. Note that this is an auxiliary helper function and so
doesn't need mp_check_self -- that's used for functions that can be
accessed directly from Python code (eg from a method table).
mp_obj_module_get_globals() returns a mp_obj_dict_t*, and type->locals_dict
is a mp_obj_dict_t*, so access the map entry of the dict directly instead
of needing to cast this mp_obj_dict_t* up to an object and then calling the
mp_obj_dict_get_map() helper function.
For generating functions there is no need to wrap the bytecode function in
a generator wrapper instance. Instead the type of the bytecode function
can be changed to mp_type_gen_wrap. This reduces code size and saves a
block of GC heap RAM for each generator.
This feature is controlled at compile time by MICROPY_PY_URE_SUB, disabled
by default.
Thanks to @dmazzella for the original patch for this feature; see #3770.
This feature is controlled at compile time by
MICROPY_PY_URE_MATCH_SPAN_START_END, disabled by default.
Thanks to @dmazzella for the original patch for this feature; see #3770.
This feature is controlled at compile time by MICROPY_PY_URE_MATCH_GROUPS,
disabled by default.
Thanks to @dmazzella for the original patch for this feature; see #3770.
Before this patch the context manager's __aexit__() method would not be
executed if a return/break/continue statement was used to exit an async
with block. async with now has the same semantics as normal with.
The fix here applies purely to the compiler, and does not modify the
runtime at all. It might (eventually) be better to define new bytecode(s)
to handle async with (and maybe other async constructs) in a cleaner, more
efficient way.
One minor drawback with addressing this issue purely in the compiler is
that it wasn't possible to get 100% CPython semantics. The thing that is
different here to CPython is that the __aexit__ method is not looked up in
the context manager until it is needed, which is after the body of the
async with statement has executed. So if a context manager doesn't have
__aexit__ then CPython raises an exception before the async with is
executed, whereas uPy will raise it after it is executed. Note that
__aenter__ is looked up at the beginning in uPy because it needs to be
called straightaway, so if the context manager isn't a context manager then
it'll still raise an exception at the same location as CPython. The only
difference is if the context manager has the __aenter__ method but not the
__aexit__ method, then in that case uPy has different behaviour. But this
is a very minor, and acceptable, difference.
Allow including crypto consts based on compilation settings. Disabled by
default to reduce code size; if one wants extra code readability, can
enable them.
The API follows guidelines of https://www.python.org/dev/peps/pep-0272/,
but is optimized for code size, with the idea that full PEP 0272
compatibility can be added with a simple Python wrapper mode.
The naming of the module follows (u)hashlib pattern.
At the bare minimum, this module is expected to provide:
* AES128, ECB (i.e. "null") mode, encrypt only
Implementation in this commit is based on axTLS routines, and implements
following:
* AES 128 and 256
* ECB and CBC modes
* encrypt and decrypt
The existing mp_get_stream_raise() helper does explicit checks that the
input object is a real pointer object, has a non-NULL stream protocol, and
has the desired stream C method (read/write/ioctl). In most cases it is
not necessary to do these checks because it is guaranteed that the input
object has the stream protocol and desired C methods. For example, native
objects that use the stream wrappers (eg mp_stream_readinto_obj) in their
locals dict always have the stream protocol (or else they shouldn't have
these wrappers in their locals dict).
This patch introduces an efficient mp_get_stream() which doesn't do any
checks and just extracts the stream protocol struct. This should be used
in all cases where the argument object is known to be a stream. The
existing mp_get_stream_raise() should be used primarily to verify that an
object does have the correct stream protocol methods.
All uses of mp_get_stream_raise() in py/stream.c have been converted to use
mp_get_stream() because the argument is guaranteed to be a proper stream
object.
This patch improves efficiency of stream operations and reduces code size.
This patch changes dupterm to call the native C stream methods on the
connected stream objects, instead of calling the Python readinto/write
methods. This is much more efficient for native stream objects like UART
and webrepl and doesn't require allocating a special dupterm array.
This change is a minor breaking change from the user's perspective because
dupterm no longer accepts pure user stream objects to duplicate on. But
with the recent addition of uio.IOBase it is possible to still create such
classes just by inheriting from uio.IOBase, for example:
import uio, uos
class MyStream(uio.IOBase):
def write(self, buf):
# existing write implementation
def readinto(self, buf):
# existing readinto implementation
uos.dupterm(MyStream())
Via the config value MICROPY_PY_UHASHLIB_SHA256. Default to enabled to
keep backwards compatibility.
Also add default value for the sha1 class, to at least document its
existence.
A user class derived from IOBase and implementing readinto/write/ioctl can
now be used anywhere a native stream object is accepted.
The mapping from C to Python is:
stream_p->read --> readinto(buf)
stream_p->write --> write(buf)
stream_p->ioctl --> ioctl(request, arg)
Among other things it allows the user to:
- create an object which can be passed as the file argument to print:
print(..., file=myobj), and then print will pass all the data to the
object via the objects write method (same as CPython)
- pass a user object to uio.BufferedWriter to buffer the writes (same as
CPython)
- use select.select on a user object
- register user objects with select.poll, in particular so user objects can
be used with uasyncio
- create user files that can be returned from user filesystems, and import
can import scripts from these user files
For example:
class MyOut(io.IOBase):
def write(self, buf):
print('write', repr(buf))
return len(buf)
print('hello', file=MyOut())
The feature is enabled via MICROPY_PY_IO_IOBASE which is disabled by
default.
This patch adds the gc_sweep_all() function which does a garbage collection
without tracing any root pointers, so frees all the memory, and most
importantly runs any remaining finalisers.
This helps primarily for soft reset: it will close any open files, any open
sockets, and help to get the system back to a clean state upon soft reset.
This patch is a code optimisation, trading text bytes for speed. On
pyboard it's an increase of 0.06% in code size for a gain (in pystone
performance) of roughly 6.5%.
The patch optimises load/store/delete of attributes in user defined classes
by not looking up special accessors (@property, __get__, __delete__,
__set__, __setattr__ and __getattr_) if they are guaranteed not to exist in
the class.
Currently, if you do my_obj.foo() then the runtime has to do a few checks
to see if foo is a property or has __get__, and if so delegate the call.
And for stores things like my_obj.foo = 1 has to first check if foo is a
property or has __set__ defined on it.
Doing all those checks each and every time the attribute is accessed has a
performance penalty. This patch eliminates all those checks for cases when
it's guaranteed that the checks will always fail, ie no attributes are
properties nor have any special accessor methods defined on them.
To make this guarantee it checks all attributes of a user-defined class
when it is first created. If any of the attributes of the user class are
properties or have special accessors, or any of the base classes of the
user class have them, then it sets a flag in the class to indicate that
special accessors must be checked for. Then in the load/store/delete code
it checks this flag to see if it can take the shortcut and optimise the
lookup.
It's an optimisation that's pretty widely applicable because it improves
lookup performance for all methods of user defined classes, and stores of
attributes, at least for those that don't have special accessors. And, it
allows to enable descriptors with minimal additional runtime overhead if
they are not used for a particular user class.
There is one restriction on dynamic class creation that has been introduced
by this patch: a user-defined class cannot go from zero special accessors
to one special accessor (or more) after that class has been subclassed. If
the script attempts this an AttributeError is raised (see addition to
tests/misc/non_compliant.py for an example of this case).
The cost in code space bytes for the optimisation in this patch is:
unix x64: +528
unix nanbox: +508
stm32: +192
cc3200: +200
esp8266: +332
esp32: +244
Performance tests that were done:
- on unix x86-64, pystone improved by about 5%
- on pyboard, pystone improved by about 6.5%, from 1683 up to 1794
- on pyboard, bm_chaos (from CPython benchmark suite) improved by about 5%
- on esp32, pystone improved by about 30% (but there are caching effects)
- on esp32, bm_chaos improved by about 11%
This VFS component allows to mount a host POSIX filesystem within the uPy
VFS sub-system. All traditional POSIX file access then goes through the
VFS, allowing to sandbox a uPy process to a certain sub-dir of the host
system, as well as mount other filesystem types alongside the host
filesystem.
Since a long time now, mp_obj_type_t no longer refers explicitly to
mp_stream_p_t but rather to an abstract "const void *protocol". So there's
no longer any need to define mp_stream_p_t in obj.h and it can go with all
its associated definitions in stream.h. Pretty much all users of this type
will already include the stream header.
The code_state.old_globals variable is there to save the globals state so
should be used for this purpose, to avoid the need for additional local
variables on the C stack.
Without this, if GC threshold is hit and there is not enough memory left to
satisfy the request, gc_collect() will run a second time and the search for
memory will happen again and will fail again.
Thanks to @adritium for pointing out this issue, see #3786.
Under ubsan, when evaluating hash(-0.) the following diagnostic occurs:
../../py/objfloat.c:102:15: runtime error: negation of
-9223372036854775808 cannot be represented in type 'mp_int_t' (aka
'long'); cast to an unsigned type to negate this value to itself
So do just that, to tell the compiler that we want to perform this
operation using modulo arithmetic rules.
Before this, ubsan would detect a problem when executing
hash(006699999999999999999999999999999999999999999999999999999999999999999999)
../../py/mpz.c:1539:20: runtime error: left shift of 1067371580458 by
32 places cannot be represented in type 'mp_int_t' (aka 'long')
When the overflow does occur it now happens as defined by the rules of
unsigned arithmetic.
When computing e.g. hash(0.4e3) with ubsan enabled, a diagnostic like the
following would occur:
../../py/objfloat.c:91:30: runtime error: shift exponent 44 is too
large for 32-bit type 'int'
By casting constant "1" to the right type the intended value is preserved.
Fuzz testing combined with the undefined behavior sanitizer found that
parsing unreasonable float literals like 1e+9999999999999 resulted in
undefined behavior due to overflow in signed integer arithmetic, and a
wrong result being returned.
There is no need to use the mp_int_t type which may be 64-bits wide, there
is enough bit-width in a normal int to parse reasonable exponents. Using
int helps to reduce code size for 64-bit ports, especially nan-boxing
builds. (Similarly for the "dig" variable which is now an unsigned int.)
Calling memset(NULL, value, 0) is not standards compliant so we must add an
explicit check that emit->label_offsets is indeed not NULL before calling
memset (this pointer will be NULL on the first pass of the parse tree and
it's more logical / safer to check this pointer rather than check that the
pass is not the first one).
Code sanitizers will warn if NULL is passed as the first value to memset,
and compilers may optimise the code based on the knowledge that any pointer
passed to memset is guaranteed not to be NULL.
Before this patch:
>>> print(')
... ')
Traceback (most recent call last):
File "<stdin>", line 1
SyntaxError: invalid syntax
After this patch:
>>> print(')
Traceback (most recent call last):
File "<stdin>", line 1
SyntaxError: invalid syntax
This matches CPython and prevents getting stuck in REPL continuation when a
1-quote is unmatched.
Before this patch, when using the switch statement for dispatch in the VM
(not computed goto) a pending exception check was done after each opcode.
This is not necessary and this patch makes the pending exception check only
happen when explicitly requested by certain opcodes, like jump. This
improves performance of the VM by about 2.5% when using the switch.
This patch fixes the macro so you can pass any name in, and the macro will
make more sense if you're reading it on its own. It worked previously
because n_state is always passed in as n_state_out_var.
gcc 8.0 supports the naked attribute for x86 systems so it can now be used
here. And in fact it is necessary to use this for nlr_push because gcc 8.0
no longer generates a prelude for this function (even without the naked
attribute).
This patch moves the start of the root pointer section in mp_state_ctx_t
so that it skips entries that are not pointers and don't need scanning.
Previously, the start of the root pointer section was at the very beginning
of the mp_state_ctx_t struct (which is the beginning of mp_state_thread_t).
This was the original assembler version of the NLR code was hard-coded to
have the nlr_top pointer at the start of this state structure. But now
that the NLR code is partially written in C there is no longer this
restriction on the location of nlr_top (and a comment to this effect has
been removed in this patch).
So now the root pointer section starts part way through the
mp_state_thread_t structure, after the entries which are not root pointers.
This patch also moves the non-pointer entries for MICROPY_ENABLE_SCHEDULER
outside the root pointer section.
Moving non-pointer entries out of the root pointer section helps to make
the GC more precise and should help to prevent some cases of collectable
garbage being kept.
This patch also has a measurable improvement in performance of the
pystone.py benchmark: on unix x86-64 and stm32 there was an improvement of
roughly 0.6% (tested with both gcc 7.3 and gcc 8.1).