The qst value is always small enough to fit in 31-bits (even less) and
using a 32-bit shift rather than a 64-bit shift reduces code size by about
300 bytes.
A user-defined type that defines __iter__ doesn't need any memory to be
pre-allocated for its iterator (because it can't use such memory). So
optimise for this case by not allocating the iter-buf.
In commit 71a3d6ec3b mp_setup_code_state was
changed from a 5-arg function to a 4-arg function, and at that point 5-arg
calls in native code were no longer needed. See also commit
4f9842ad80.
Allows assigning attributes on class instances that implement their own
__setattr__. Both object.__setattr__ and super(A, b).__setattr__ will work
with this commit.
This makes the loading of viper-code-with-relocations a bit neater and
easier to understand, by treating the rodata/bss like a special object to
be loaded into the constant table (which is how it behaves).
We don't want to add a feature flag to .mpy files that indicate float
support because it will get complex and difficult to use. Instead the .mpy
is built using whatever precision it chooses (float or double) and the
native glue API will convert between this choice and what the host runtime
actually uses.
This commit adds a new tool called mpy_ld.py which is essentially a linker
that builds .mpy files directly from .o files. A new header file
(dynruntime.h) and makefile fragment (dynruntime.mk) are also included
which allow building .mpy files from C source code. Such .mpy files can
then be dynamically imported as though they were a normal Python module,
even though they are implemented in C.
Converting .o files directly (rather than pre-linked .elf files) allows the
resulting .mpy to be more efficient because it has more control over the
relocations; for example it can skip PLT indirection. Doing it this way
also allows supporting more architectures, such as Xtensa which has
specific needs for position-independent code and the GOT.
The tool supports targets of x86, x86-64, ARM Thumb and Xtensa (windowed
and non-windowed). BSS, text and rodata sections are supported, with
relocations to all internal sections and symbols, as well as relocations to
some external symbols (defined by dynruntime.h), and linking of qstrs.
Implements text, rodata and bss generalised relocations, as well as generic
qstr-object linking. This allows importing dynamic native modules on all
supported architectures in a unified way.
With the memcpy() call placed last it avoids the effects of registers
clobbering. It's definitely effective in non-inlined functions, but even
here it is still making a small difference. For example, on stm32, this
saves an extra `ldr` instruction to load `o->vstr` after the memcpy()
returns.
The string length being longer than the allowed qstr length can happen in
many locations, for example in the parser with very long variable names.
Without an explicit check that the length is within range (as done in this
patch) the code would exhibit crashes and strange behaviour with truncated
strings.
This commit adds a sys.implementation.mpy entry when the system supports
importing .mpy files. This entry is a 16-bit integer which encodes two
bytes of information from the header of .mpy files that are supported by
the system being run: the second and third bytes, .mpy version, and flags
and native architecture. This allows determining the supported .mpy file
dynamically by code, and also for the user to find it out by inspecting
this value. It's further possible to dynamically detect if the system
supports importing .mpy files by `hasattr(sys.implementation, 'mpy')`.
Replace the is_running field with a tri-state variable to indicate
running/not-running/pending-exception.
Update tests to cover the various cases.
This allows cancellation in uasyncio even if the coroutine hasn't been
executed yet. Fixes#5242
This wasn't necessary as the wrapped function already has a reference to
its globals. But it had a dual purpose of tracking whether the function
was currently running, so replace it with a bool.
runtime0.h is part of the MicroPython ABI so it's simpler if it's
independent of config options, like MICROPY_PY_REVERSE_SPECIAL_METHODS.
What's effectively done here is to move MP_BINARY_OP_DIVMOD and
MP_BINARY_OP_CONTAINS up in the enum, then remove the #if
MICROPY_PY_REVERSE_SPECIAL_METHODS conditional.
Without this change .mpy files would need to have a feature flag for
MICROPY_PY_REVERSE_SPECIAL_METHODS (when embedding native code that uses
this enum).
This commit has no effect when MICROPY_PY_REVERSE_SPECIAL_METHODS is
disabled. With this option enabled this commit reduces code size by about
60 bytes.
For consistency with "umachine". Now that weak links are enabled
by default for built-in modules, this should be a no-op, but allows
extension of the bluetooth module by user code.
Also move registration of ubluetooth to objmodule rather than
port-specific.
This commit implements automatic module weak links for all built-in
modules, by searching for "ufoo" in the built-in module list if "foo"
cannot be found. This means that all modules named "ufoo" are always
available as "foo". Also, a port can no longer add any other weak links,
which makes strict the definition of a weak link.
It saves some code size (about 100-200 bytes) on ports that previously had
lots of weak links.
Some changes from the previous behaviour:
- It doesn't intern the non-u module names (eg "foo" is not interned),
which saves code size, but will mean that "import foo" creates a new qstr
(namely "foo") in RAM (unless the importing module is frozen).
- help('modules') no longer lists non-u module names, only the u-variants;
this reduces duplication in the help listing.
Weak links are effectively the same as having a set of symbolic links on
the filesystem that is searched last. So an "import foo" will search
built-in modules first, then all paths in sys.path, then weak links last,
importing "ufoo" if it exists. Thus a file called "foo.py" somewhere in
sys.path will still have precedence over the weak link of "foo" to "ufoo".
See issues: #1740, #4449, #5229, #5241.
When loading a manifest file, e.g. by include(), it will chdir first to the
directory of that manifest. This means that all file operations within a
manifest are relative to that manifest's location.
As a consequence of this, additional environment variables are needed to
find absolute paths, so the following are added: $(MPY_LIB_DIR),
$(PORT_DIR), $(BOARD_DIR). And rename $(MPY) to $(MPY_DIR) to be
consistent.
Existing manifests are updated to match.
This introduces a new build variable FROZEN_MANIFEST which can be set to a
manifest listing (written in Python) that describes the set of files to be
frozen in to the firmware.
Instead of encoding 4 zero bytes as placeholders for the simple_name and
source_file qstrs, and storing the qstrs after the bytecode, store the
qstrs at the location of these 4 bytes. This saves 4 bytes per bytecode
function stored in a .mpy file (for example lcd160cr.mpy drops by 232
bytes, 4x 58 functions). And resulting code size is slightly reduced on
ports that use this feature.
In which case place the native function prelude in a bytes object, linked
from the const_table of that function. An architecture should define
N_PRELUDE_AS_BYTES_OBJ to 1 before including py/emitnative.c to emit
correct machine code, then enable MICROPY_EMIT_NATIVE_PRELUDE_AS_BYTES_OBJ
so the runtime can correctly handle the prelude being in a bytes object.
Such that args/return regs for the parent are different to args/return regs
for child calls. For an architecture to use this feature it should define
the REG_PARENT_xxx macros before including py/emitnative.c.
Prior to this commit, when unwinding through an active finally the stack
was not being correctly popped/folded, which resulting in the VM crashing
for complicated unwinding of nested finallys.
This should be fixed with this commit, and more tests for return/break/
continue within a finally have been added to exercise this.
As of 7d58a197cf, `NULL` should no longer be
here because it's allowed (MP_QSTRnull took its place). This entry was
preventing the use of MP_QSTR_NULL to mean "NULL" (although this is not
currently used).
A blacklist should not be needed because it should be possible to intern
all strings.
Fixes issue #5140.
This check follows CPython's behaviour, because 'import *' always populates
the globals with the imported names, not locals.
Since it's safe to do this (doesn't lead to a crash or undefined behaviour)
the check is only enabled for MICROPY_CPYTHON_COMPAT.
Fixes issue #5121.
This patch compresses the second part of the bytecode prelude which
contains the source file name, function name, source-line-number mapping
and cell closure information. This part of the prelude now begins with a
single varible length unsigned integer which encodes 2 numbers, being the
byte-size of the following 2 sections in the header: the "source info
section" and the "closure section". After decoding this variable unsigned
integer it's possible to skip over one or both of these sections very
easily.
This scheme saves about 2 bytes for most functions compared to the original
format: one in the case that there are no closure cells, and one because
padding was eliminated.
The start of the bytecode prelude contains 6 numbers telling the amount of
stack needed for the Python values and exceptions, and the signature of the
function. Prior to this patch these numbers were all encoded one after the
other (2x variable unsigned integers, then 4x bytes), but using so many
bytes is unnecessary.
An entropy analysis of around 150,000 bytecode functions from the CPython
standard library showed that the optimal Shannon coding would need about
7.1 bits on average to encode these 6 numbers, compared to the existing 48
bits.
This patch attempts to get close to this optimal value by packing the 6
numbers into a single, varible-length unsigned integer via bit-wise
interleaving. The interleaving scheme is chosen to minimise the average
number of bytes needed, and at the same time keep the scheme simple enough
so it can be implemented without too much overhead in code size or speed.
The scheme requires about 10.5 bits on average to store the 6 numbers.
As a result most functions which originally took 6 bytes to encode these 6
numbers now need only 1 byte (in 80% of cases).
From the beginning of this project the RAISE_VARARGS opcode was named and
implemented following CPython, where it has an argument (to the opcode)
counting how many args the raise takes:
raise # 0 args (re-raise previous exception)
raise exc # 1 arg
raise exc from exc2 # 2 args (chained raise)
In the bytecode this operation therefore takes 2 bytes, one for
RAISE_VARARGS and one for the number of args.
This patch splits this opcode into 3, where each is now a single byte.
This reduces bytecode size by 1 byte for each use of raise. Every byte
counts! It also has the benefit of reducing code size (on all ports except
nanbox).
To make progress towards MicroPython supporting Python 3.5, adding the
matmul operator is important because it's a really "low level" part of the
language, being a new token and modifications to the grammar.
It doesn't make sense to make it configurable because 1) it would make the
grammar and lexer complicated/messy; 2) no other operators are
configurable; 3) it's not a feature that can be "dynamically plugged in"
via an import.
And matmul can be useful as a general purpose user-defined operator, it
doesn't have to be just for numpy use.
Based on work done by Jim Mussared.
Prior to this patch mp_opcode_format would calculate the incorrect size of
the MP_BC_UNWIND_JUMP opcode, missing the additional byte. But, because
opcodes below 0x10 are unused and treated as bytes in the .mpy load/save
and freezing code, this bug did not show any symptoms, since nested unwind
jumps would rarely (if ever) reach a depth of 16 (so the extra byte of this
opcode would be between 0x01 and 0x0f and be correctly loaded/saved/frozen
simply as an undefined opcode).
This patch fixes this bug by correctly accounting for the additional byte.
.
With this patch alignment is done relative to the start of the buffer that
is being unpacked, not the raw pointer value, as per CPython.
Fixes issue #3314.
This commit adds support for sys.settrace, allowing to install Python
handlers to trace execution of Python code. The interface follows CPython
as closely as possible. The feature is disabled by default and can be
enabled via MICROPY_PY_SYS_SETTRACE.
Prior to this patch the line number for a lambda would be "line 1" if the
body of the lambda contained only a simple expression (with no line number
stored in the parse node). Now the line number is always reported
correctly.
mp_compile no longer takes an emit_opt argument, rather this setting is now
provided by the global default_emit_opt variable.
Now, when -X emit=native is passed as a command-line option, the emitter
will be set for all compiled modules (included imports), not just the
top-level script.
In the future there could be a way to also set this variable from a script.
Fixes issue #4267.
With this patch exceptions that are re-raised have improved tracebacks
(less confusing, match CPython), and it makes re-raise slightly more
efficient (in time and RAM) because they no longer need to add a traceback.
Also general VM performance is not measurably affected.
Partially fixes issue #2928.
With this patch exception tracebacks that go through a finally are improved
(less confusing, match CPython), and it makes finally's slightly more
efficient (in time and RAM) because they no longer need to add a traceback.
Partially fixes issue #2928.
It's really an opcode that's not implemented, so use "opcode" instead of
"byte code". And remove the redundant "not implemented" text because that
is already implied by the exception type. There's no need to have a long
error message for an exception that is almost never encountered. Saves
about 20 bytes of code size on most ports.
Enabled via MICROPY_PY_URE_DEBUG, disabled by default (but enabled on unix
coverage build). This is a rarely used feature that costs a lot of code
(500-800 bytes flash). Debugging of regular expressions can be done
offline with other tools.
Recent versions of gcc perform optimisations which can lead to the
following code from the MP_NLR_JUMP_HEAD macro being omitted:
top->ret_val = val; \
MP_NLR_RESTORE_PYSTACK(top); \
*_top_ptr = top->prev; \
This is noticeable (at least) in the unix coverage on x86-64 built with gcc
9.1.0. This is because the nlr_jump function is marked as no-return, so
gcc deduces that the above code has no effect.
Adding MP_UNREACHABLE tells the compiler that the asm code may branch
elsewhere, and so it cannot optimise away the code.
As per PEP 485, this function appeared in for Python 3.5. Configured via
MICROPY_PY_MATH_ISCLOSE which is disabled by default, but enabled for the
ports which already have MICROPY_PY_MATH_SPECIAL_FUNCTIONS enabled.
Prior to this patch the amount of free space in an array (including
bytearray) was not being maintained correctly for the case of slice
assignment which changed the size of the array. Under certain cases (as
encoded in the new test) it was possible that the array could grow beyond
its allocated memory block and corrupt the heap.
Fixes issue #4127.
This patch implements a new sys.atexit function which registers a function
that is later executed when the main script ends. It is configurable via
MICROPY_PY_SYS_ATEXIT, disabled by default.
This is not compliant with CPython, rather it can be used to implement a
CPython compatible "atexit" module if desired (similar to how
sys.print_exception can be used to implement functionality of the
"traceback" module).
This patch adds a simple but powerful hook into the import system, in a
CPython compatible way, by allowing to override builtins.__import__.
This does introduce some overhead to all imports but it's minor:
- the dict lookup of __import__ is bypassed if there are no modifications
to the builtins module (which is the case at start up);
- imports are not performance critical, usually done just at the start of a
script;
- compared to how much work is done in an import, looking up a value in a
dict is a relatively small additional piece of work.
JSON requires that keys of objects be strings. CPython will therefore
automatically quote simple types (NoneType, bool, int, float) when they are
used directly as keys in JSON output. To prevent subtle bugs and emit
compliant JSON, MicroPython should at least test for such keys so they
aren't silently let through. Then doing the actual quoting is a similar
cost to raising an exception, so that's what is implemented by this patch.
Fixes issue #4790.
Behaviour was changed from stack to queue in
8977c7eb58, and this updates variable names
to match. Also updates other references (docs, error messages).
__clear_cache causes a compile error when using clang. Instead use
__builtin___clear_cache which is available under both gcc and clang.
Also replace tabs with spaces in this section of code (introduced by a
previous commit).
This fixes compiling for older architectures (e.g. armv5tej).
According to [1], the limit of R0-R7 for the STR and LDR instructions is
tied to the Thumb instruction set and not any specific processor
architectures.
[1]: http://www.keil.com/support/man/docs/armasm/armasm_dom1361289906890.htm
With both MICROPY_PERSISTENT_CODE_SAVE and MICROPY_PERSISTENT_CODE_LOAD
enabled the code fails to compile, due to undeclared 'n_obj'. If
MICROPY_EMIT_NATIVE is disabled there are more errors due to the use of
undefined fields in mp_raw_code_t.
This patch fixes such compilation by avoiding undefined fields.
MICROPY_EMIT_NATIVE was changed to MICROPY_EMIT_MACHINE_CODE in this file
to match the mp_raw_code_t definition.
These s16-s21 registers are used by gcc so need to be saved. Future
versions of gcc (beyond v9.1.0), or other compilers, may eventually need
additional registers saved/restored.
See issue #4844.
Instead of converting to a small-int at runtime this can be done at compile
time, then we only have a simple comparison during runtime. This reduces
code size on some ports (e.g -4 on qemu-arm, -52 on unix nanbox), and for
others at least doesn't increase code size.
mpy-cross uses MICROPY_DYNAMIC_COMPILER and MICROPY_EMIT_NATIVE but does
not actually need to execute native functions, and does not need
mp_fun_table. This commit makes it so mp_fun_table and all its entries are
not built when MICROPY_DYNAMIC_COMPILER is enabled, significantly reducing
the size of the mpy-cross executable and allowing it to be built on more
machines/OS's.
This ; make Windows compilation fail with GNU makefile 4.2.1. It was added
in 0dc85c9f86 as part of a shell if-
statement, but this if-statement was subsequently removed in
23a693ec2d so the semicolon is not needed.
The variable $(TOUCH) is initialized with the "touch" value in mkenv.mk
like for the other command line tools (rm, echo, cp, mkdir etc). With
this, for example, Windows users can specify the path of touch.exe.
The variable $(CAT) is initialised with the "cat" value in mkenv.mk like
for the other command line tools (rm, echo, cp, mkdir etc). With this,
for example, Windows users can specify the path of cat.exe.
Reuse the implementation for bytes since it works the same way regardless
of the underlying type. This method gets added for CPython compatibility
of bytearray, but to keep the code simple and small array.array now also
has a working decode method, which is non-standard but doesn't hurt.
This allows figuring out the number of bytes in the memoryview object as
len(memview) * memview.itemsize.
The feature is enabled via MICROPY_PY_BUILTINS_MEMORYVIEW_ITEMSIZE and is
disabled by default.
Prior to this commit, building the unix port with `DEBUG=1` and
`-finstrument-functions` the compilation would fail with an error like
"control reaches end of non-void function". This change fixes this by
removing the problematic "if (0)" branches. Not all branches affect
compilation, but they are all removed for consistency.
With this change, @micropython.asm_thumb functions will work on standard
ARM processors (that are in ARM state by default), in scripts and
precompiled .mpy files.
Addresses issue #4675.
This system makes it a lot easier to include external libraries as static,
native modules in MicroPython. Simply pass USER_C_MODULES (like
FROZEN_MPY_DIR) as a make parameter.
During make, makemoduledefs.py parses the current builds c files for
MP_REGISTER_MODULE(module_name, obj_module, enabled_define)
These are used to generate a header with the required entries for
"mp_rom_map_elem_t mp_builtin_module_table[]" in py/objmodule.c
This commit adds support for saving and loading .mpy files that contain
native code (native, viper and inline-asm). A lot of the ground work was
already done for this in the form of removing pointers from generated
native code. The changes here are mainly to link in qstr values to the
native code, and change the format of .mpy files to contain native code
blocks (possibly mixed with bytecode).
A top-level summary:
- @micropython.native, @micropython.viper and @micropython.asm_thumb/
asm_xtensa are now allowed in .py files when compiling to .mpy, and they
work transparently to the user.
- Entire .py files can be compiled to native via mpy-cross -X emit=native
and for the most part the generated .mpy files should work the same as
their bytecode version.
- The .mpy file format is changed to 1) specify in the header if the file
contains native code and if so the architecture (eg x86, ARMV7M, Xtensa);
2) for each function block the kind of code is specified (bytecode,
native, viper, asm).
- When native code is loaded from a .mpy file the native code must be
modified (in place) to link qstr values in, just like bytecode (see
py/persistentcode.c:arch_link_qstr() function).
In addition, this now defines a public, native ABI for dynamically loadable
native code generated by other languages, like C.
The new compile-time option is MICROPY_DEBUG_MP_OBJ_SENTINELS, disabled by
default. This is to allow finer control of whether this debugging feature
is enabled or not (because, for example, this setting must be the same for
mpy-cross and the MicroPython main code when using native code generation).
When encoded in the mpy file, if qstr <= QSTR_LAST_STATIC then store two
bytes: 0, static_qstr_id. Otherwise encode the qstr as usual (either with
string data or a reference into the qstr window).
Reduces mpy file size by about 5%.
Instead of emitting two bytes in the bytecode for where the linked qstr
should be written to, it is now replaced by the actual qstr data, or a
reference into the qstr window.
Reduces mpy file size by about 10%.
This is an implementation of a sliding qstr window used to reduce the
number of qstrs stored in a .mpy file. The window size is configured to 32
entries which takes a fixed 64 bytes (16-bits each) on the C stack when
loading/saving a .mpy file. It allows to remember the most recent 32 qstrs
so they don't need to be stored again in the .mpy file. The qstr window
uses a simple least-recently-used mechanism to discard the least recently
used qstr when the window overflows (similar to dictionary compression).
This scheme only needs a single pass to save/load the .mpy file.
Reduces mpy file size by about 25% with a window size of 32.
POP_BLOCK and POP_EXCEPT are now the same, and are always followed by a
JUMP. So this optimisation reduces code size, and RAM usage of bytecode by
two bytes for each try-except handler.
This patch fixes a bug in the VM when breaking within a try-finally. The
bug has to do with executing a break within the finally block of a
try-finally statement. For example:
def f():
for x in (1,):
print('a', x)
try:
raise Exception
finally:
print(1)
break
print('b', x)
f()
Currently in uPy the above code will print:
a 1
1
1
segmentation fault (core dumped) micropython
Not only is there a seg fault, but the "1" in the finally block is printed
twice. This is because when the VM executes a finally block it doesn't
really know if that block was executed due to a fall-through of the try (no
exception raised), or because an exception is active. In particular, for
nested finallys the VM has no idea which of the nested ones have active
exceptions and which are just fall-throughs. So when a break (or continue)
is executed it tries to unwind all of the finallys, when in fact only some
may be active.
It's questionable whether break (or return or continue) should be allowed
within a finally block, because they implicitly swallow any active
exception, but nevertheless it's allowed by CPython (although almost never
used in the standard library). And uPy should at least not crash in such a
case.
The solution here relies on the fact that exception and finally handlers
always appear in the bytecode after the try body.
Note: there was a similar bug with a return in a finally block, but that
was previously fixed in b735208403
Also, to make it possible for ports to provide their own lwipopts.h, the
default include directory of extmod/lwip-include is no longer added and
instead a port should now make sure the correct include directory is
included in the list (can still use extmod/lwip-include).
This optimisation eliminates the need to create a temporary normal dict.
The optimisation is enabled via MICROPY_COMP_CONST_LITERAL which is enabled
by default (although only has an effect if OrderdDict is enabled).
Thanks to @pfalcon for the initial idea and implementation.
All exceptions that unwind through the async-with must be caught and
BaseException is the top-level class, which includes Exception and others.
Fixes issue #4552.
As mentioned in #4450, `websocket` was experimental with a single intended
user, `webrepl`. Therefore, we'll make this change without a weak
link `websocket` -> `uwebsocket`.
This change makes it so that python3 is required by default to build
MicroPython. Python 2 can be used by specifying make PYTHON=python2.
This comes about due to a recent-ish change to PEP 394 that makes the
python command more optional than before (even with Python 2 installed);
see cd59ec03c8 (diff-1d22f7bd72cbc900670f058b1107d426)
Since the command python is no longer required to be provided by a
distribution we need to use either python2 or python3 as commands. And
python3 seems the obvious choice.
These macros could in principle be (inline) functions so it makes sense to
have them lower case, to match the other C API functions.
The remaining macros that are upper case are:
- MP_OBJ_TO_PTR, MP_OBJ_FROM_PTR
- MP_OBJ_NEW_SMALL_INT, MP_OBJ_SMALL_INT_VALUE
- MP_OBJ_NEW_QSTR, MP_OBJ_QSTR_VALUE
- MP_OBJ_FUN_MAKE_SIG
- MP_DECLARE_CONST_xxx
- MP_DEFINE_CONST_xxx
These must remain macros because they are used when defining const data (at
least, MP_OBJ_NEW_SMALL_INT is so it makes sense to have
MP_OBJ_SMALL_INT_VALUE also a macro).
For those macros that have been made lower case, compatibility macros are
provided for the old names so that users do not need to change their code
immediately.
Python defines warnings as belonging to categories, where category is a
warning type (descending from exception type). This is useful, as e.g.
allows to disable warnings selectively and provide user-defined warning
types. So, implement this in MicroPython, except that categories are
represented just with strings. However, enough hooks are left to implement
categories differently per-port (e.g. as types), without need to patch each
and every usage.
If MICROPY_PERSISTENT_CODE_LOAD or MICROPY_ENABLE_COMPILER are enabled then
code gets enabled that calls file reading functions which may be disabled
if no readers have been implemented.
To fix this, introduce a MICROPY_HAS_FILE_READER variable, which is
automatically set if MICROPY_READER_POSIX or MICROPY_READER_VFS is set but
can also be manually set if a custom reader is being implemented. Then
disable the file reading calls if this is not set.
For architectures where size_t is less than 32 bits (eg 16 bits) the args
must be casted to uint32_t so the left shift will work. For architectures
where size_t is greater than 32 bits (eg 64 bits) this new casting will not
lose any bits because the end result must anyway fit in a uint32_t.
Changes to the layout of the bytecode header meant that this debug code was
no longer compiling. This is now fixed and a new compile-time option is
introduced, MICROPY_DEBUG_VM_STACK_OVERFLOW, to turn on this feature (which
is disabled by default). This option is needed because more than one file
needs to cooperate to make this check work.
It's more robust to have the version defined statically in a header file,
rather than dynamically generating it via git using a git tag. In case
git doesn't exist, or a different source control tool is used, it's
important to still have the uPy version number available.
The older "bool has_finaliser" gets recast as GC_ALLOC_FLAG_HAS_FINALISER=1
so this is a backwards compatible change to the signature. Since bool gets
implicitly converted to 1 this patch doesn't include conversion of all
calls.
Both mp_type_array and mp_type_memoryview use the same object structure,
mp_obj_array_t, but for the case of memoryview, some fields, e.g. "free",
have different meaning. As the "free" field is also a bitfield, assume
that (anonymous) union can't be used here (for the concerns of possible
compatibility issues with wide array of toolchains), and just add a field
alias using a #define. As it's a define, it should be a selective
identifier, so use verbose "memview_offset" to avoid any clashes.
All 4 opcodes that can have caching bytes also have qstrs, so the test for
them must go in the qstr part of the code. The reason this incorrect
calculation of the opcode size did not lead to a bug is because the caching
byte is at the end of the opcode (byte, qstr, qstr, cache) and is always
0x00 when saving/loading, so was just treated as a single byte no-op
opcode. Hence these opcodes were being saved/loaded/decoded correctly.
Thanks to @malinah for finding the problem and providing the initial patch.
mp_obj_new_exception_msg() assumes that the message passed to it is in ROM
and so can use its data directly to create the string object for the
argument of the exception, saving RAM. At the same time, this approach
also makes sure that there is no attempt to format the message with printf,
which could lead to faults if the message contained % characters.
Fixes issue #3004.
SHORT, INT, LONG, LONGLONG, and unsigned (U*) variants are being defined.
This is done at compile using GCC-style predefined macros like
__SIZEOF_INT__. If the compiler doesn't have such defines, no such types
will be defined.
Instead of assuming that the method is a bytecode object, and only
supporting load of __name__, make the operation generic by delegating the
load to the method object itself. Saves a bit of code size and fixes the
case of attempting to load __name__ on a native method, see issue #4028.
A new option MICROPY_GC_STACK_ENTRY_TYPE is added to select a custom type
instead of size_t for the gc_stack array items. This can be beneficial for
small devices, especially those that are low on memory anyway. If a device
has 1MB or less of heap (and 16-byte GC blocks) then this type can be
uint16_t, saving 128 bytes of RAM.
There was an assumption that all names in a module dict are qstr's.
However, they can be dynamically generated (by assigning to globals()),
and in case of a long name, it won't be a qstr. Handle this situation
properly, including taking care of not creating superfluous qstr's for
names starting with "_" (which aren't imported by "import *").
Taking the address of a local variable is mildly expensive, in code size
and stack usage. So optimise scope_find_or_add_id() to not need to take a
pointer to the "added" variable, and instead take the kind to use for newly
added identifiers.
This ensures that implicit variables are only converted to implicit
closed-over variables (nonlocals) at the very end of the function scope.
If variables are closed-over when first used (read from, as was done prior
to this commit) then this can be incorrect because the variable may be
assigned to later on in the function which means they are just a plain
local, not closed over.
Fixes issue #4272.
Building axtls gives a lot of warnings with -Wall enabled, and explicitly
disabling all of them cannot be done in a way compatible with gcc and
clang, and likely other compilers. So just use -Wno-all to prevent all of
the extra warnings (in addition to the necessary -Wno-unused-parameter,
-Wno-uninitialized, -Wno-sign-compare and -Wno-old-style-definition).
Fixes issue #4182.
Configurable via MICROPY_MODULE_GETATTR, disabled by default. Among other
things __getattr__ for modules can help to build lazy loading / code
unloading at runtime.
Configurable via MICROPY_PY_BUILTINS_STR_COUNT. Default is enabled.
Disabled for bare-arm, minimal, unix-minimal and zephyr ports. Disabling
it saves 408 bytes on x86.
So these constant objects can be loaded by dereferencing the REG_FUN_TABLE
pointer instead of loading immediate values. This reduces the size of
generated native code (when such constants are used), and means that
pointers to these constants are no longer stored in the assembly code.
The maximum index into mp_fun_table is currently less than 1024 and should
stay that way to keep things efficient for all architectures, so there is
no need to handle loading the pointer directly via a literal in this
function.
All architectures now have a dedicated register to hold the pointer to the
native function table mp_fun_table, and so they all need to load this
register at the start of the native function. This commit makes the
loading of this register uniform across architectures by passing the
pointer in the constant table for the native function, and then loading the
register from the constant table. Doing it this way means that the pointer
is not stored in the assembly code, helping to make the code more portable.
Instead of storing the function pointer directly in the assembly code.
This makes the generated code more independent of the runtime (so easier to
relocate the code), and reduces the generated code size.
The esp register is always a fixed distance below ebp, and using esp to
reference locals on the stack frees up the ebp register for general purpose
use (which is important for an architecture with only 8 user registers).
Instead of storing the function pointer directly in the assembly code.
This makes the generated code more independent of the runtime (so easier to
relocate the code), and reduces the generated code size.
The rsp register is always a fixed distance below rbp, and using rsp to
reference locals on the stack frees up the rbp register for general purpose
use.
This commit adds first class support for yield and yield-from in the native
emitter, including send and throw support, and yields enclosed in exception
handlers (which requires pulling down the NLR stack before yielding, then
rebuilding it when resuming).
This has been fully tested and is working on unix x86 and x86-64, and
stm32. Also basic tests have been done with the esp8266 port. Performance
of existing native code is unchanged.
The nlr_buf_t doesn't need to be part of the Python value stack (as it was
before this commit), it's simpler to have it separated as auxiliary state
that lives on the C stack. This will help adding yield support because in
that case the nlr_buf_t and Python value stack live in separate memory
areas (C stack and heap respectively).
Instead of at end of state, n_state - 1. It was originally (way back in
v1.0) put at the end of the state because the VM didn't have a pointer to
the start. But now that the VM takes a mp_code_state_t pointer it does
have a pointer to the start of the state so can put the exception object
there.
This commit saves about 30 bytes of code on all architectures, and, more
importantly, reduces C-stack usage by a couple of words (8 bytes on Thumb2
and 16 bytes on x86-64) for every (non-generator) call of a bytecode
function because fun_bc_call no longer needs to remember the n_state
variable.
This makes these special methods have the same calling behaviour as other
methods in a class instance (mp_convert_member_lookup() is already called
by mp_obj_class_lookup()).
And remove related comment about needing such protection when calling send.
Reasoning for removal is as follows:
- mp_resume is only called by the VM in YIELD_FROM opcode
- if send_value != MP_OBJ_NULL then throw_value == MP_OBJ_NULL
- so if __next__ or send are called then throw_value == MP_OBJ_NULL
- if __next__ or send raise an exception without nlr protection then the
exception will be handled by the global exception handler of the VM
- this handler already has code to handle exceptions raised in YIELD_FROM,
including correct handling of StopIteration
- this handler doesn't handle the case of injection of GeneratorExit, but
this won't be needed because throw_value == MP_OBJ_NULL
Note that it's already possible for mp_resume() to raise an exception
(including StopIteration) from the unprotected call to type->iternext(), so
that's why the VM already has code to handle the case of exceptions coming
out of mp_resume().
This commit reduces code size by a bit, and significantly reduces C stack
usage when using yield-from, from 88 bytes down to 40 for Thumb2, and 152
down to 72 bytes for x86-64 (better than half). (Note that gcc doesn't
seem to tail-call optimise the call from mp_resume() to mp_obj_gen_resume()
so this saving in C stack usage helps all uses of yield-from.)
mp_make_raise_obj must be used to convert a possible exception type to an
instance object, otherwise the VM may raise a non-exception object.
An existing test is adjusted to test this case, with the original test
already moved to generator_throw.py.
This matches how bytecode does it, and matches the signature of
mp_emit_glue_assign_native. Since the native emitter doesn't support
nan-boxing uintptr_t and mp_uint_t are anyway the same bit-width.
After the previous commit this macro is no longer needed by the native
emitter because live heap pointers are no longer stored in generated native
machine code.
This commit changes native code to handle constant objects like bytecode:
instead of storing the pointers inside the native code they are now stored
in a separate constant table (such pointers include objects like bignum,
bytes, and raw code for nested functions). This removes the need for the
GC to scan native code for root pointers, and takes a step towards making
native code independent of the runtime (eg so it can be compiled offline by
mpy-cross).
Note that the changes to the struct scope_t did not increase its size: on a
32-bit architecture it is still 48 bytes, and on a 64-bit architecture it
decreased from 80 to 72 bytes.
Nan and inf (signed and unsigned) are also handled correctly by using
signbit (they were also handled correctly with "val<0", but that didn't
handle -0.0 correctly). A test case is added for this behaviour.
When obj.h is compiled as C++ code, the cl compiler emits a warning about
possibly unsafe mixing of size_t and bool types in the or operation in
MP_OBJ_FUN_MAKE_SIG. Similarly there's an implicit narrowing integer
conversion in runtime.h. This commit fixes this by being explicit.
This is an improvement over previous behavior when str was returned for
both str and bytes input format. This new behaviour is also consistent
with how the % operator works, as well as many other str/bytes methods.
It should be noted that it's not how current versions of CPython work,
where there's a gap in the functionality and bytes.format() is not
supported.
This commit adds the math.factorial function in two variants:
- squared difference, which is faster than the naive version, relatively
compact, and non-recursive;
- a mildly optimised recursive version, faster than the above one.
There are some more optimisations that could be done, but they tend to take
more code, and more storage space. The recursive version seems like a
sensible compromise.
The new function is disabled by default, and uses the non-optimised version
by default if it is enabled. The options are MICROPY_PY_MATH_FACTORIAL
and MICROPY_OPT_MATH_FACTORIAL.
This patches avoids multiplying with negative powers-of-10 when parsing
floating-point values, when those powers-of-10 can be exactly represented
as a positive power. When represented as a positive power and used to
divide, the resulting float will not have any rounding errors.
The issue is that mp_parse_num_decimal will sometimes not give the closest
floating representation of the input string. Eg for "0.3", which can't be
represented exactly in floating point, mp_parse_num_decimal gives a
slightly high (by 1LSB) result. This is because it computes the answer as
3 * 0.1, and since 0.1 also can't be represented exactly, multiplying by 3
multiplies up the rounding error in the 0.1. Computing it as 3 / 10, as
now done by the change in this commit, gives an answer which is as close to
the true value of "0.3" as possible.
This commit implements PEP479 which disallows raising StopIteration inside
a generator to signal that it should be finished. Instead, the generator
should simply return when it is complete.
See https://www.python.org/dev/peps/pep-0479/ for details.
In 0e80f345f8 the inplace operations __iadd__
and __isub__ were made unconditionally available, so the comment about this
section is changed to reflect that.
Loading a pointer by indexing into the native function table mp_fun_table,
rather than loading an immediate value (via a PC-relative load), uses less
code space.
This commit makes viper functions have the same signature as native
functions, at the level of the emitter/assembler. This means that viper
functions can now be wrapped in the same uPy object as native functions.
Viper functions are now responsible for parsing their arguments (before it
was done by the runtime), and this makes calling them more efficient (in
most cases) because the viper entry code can be custom generated to suit
the signature of the function.
This change also opens the way forward for viper functions to take
arbitrary numbers of arguments, and for them to handle globals correctly,
among other things.
Now that the compiler can store the results of the viper types in the
scope, the viper parameter annotation compilation stage can be merged with
the normal parameter compilation stage.
With 5 arguments to mp_arg_check_num(), some architectures need to pass
values on the stack. So compressing n_args_min, n_args_max, takes_kw into
a single word and passing only 3 arguments makes the call more efficient,
because almost all calls to this function pass in constant values. Code
size is also reduced by a decent amount:
bare-arm: -116
minimal x86: -64
unix x64: -256
unix nanbox: -112
stm32: -324
cc3200: -192
esp8266: -192
esp32: -144
Prior to this commit a function compiled with the native decorator
@micropython.native would not work correctly when accessing global
variables, because the globals dict was not being set upon function entry.
This commit fixes this problem by, upon function entry, setting as the
current globals dict the globals dict context the function was defined
within, as per normal Python semantics, and as bytecode does. Upon
function exit the original globals dict is restored.
In order to restore the globals dict when an exception is raised the native
function must guard its internals with an nlr_push/nlr_pop pair. Because
this push/pop is relatively expensive, in both C stack usage for the
nlr_buf_t and CPU execution time, the implementation here optimises things
as much as possible. First, the compiler keeps track of whether a function
even needs to access global variables. Using this information the native
emitter then generates three different kinds of code:
1. no globals used, no exception handlers: no nlr handling code and no
setting of the globals dict.
2. globals used, no exception handlers: an nlr_buf_t is allocated on the
C stack but it is not used if the globals dict is unchanged, saving
execution time because nlr_push/nlr_pop don't need to run.
3. function has exception handlers, may use globals: an nlr_buf_t is
allocated and nlr_push/nlr_pop are always called.
In the end, native functions that don't access globals and don't have
exception handlers will run more efficiently than those that do.
Fixes issue #1573.
If bytearray is constructed from str, a second argument of encoding is
required (in CPython), and third arg of Unicode error handling is allowed,
e.g.:
bytearray("str", "utf-8", "strict")
This is similar to bytes:
bytes("str", "utf-8", "strict")
This patch just allows to pass 2nd/3rd arguments to bytearray, but
doesn't try to validate them to not impact code size. (This is also
similar to how bytes constructor is handled, though it does a bit
more validation, e.g. check that in case of str arg, encoding argument
is passed.)
This removes the need for a separate axtls build stage, and builds all
axtls object files along with other code. This simplifies and cleans up
the build process, automatically builds axtls when needed, and puts the
axtls object files in the correct $(BUILD) location.
The MicroPython axtls configuration file is provided in
extmod/axtls-include/config.h
This patch adds full support for unwinding jumps to the native emitter.
This means that return/break/continue can be used in try-except,
try-finally and with statements. For code that doesn't use unwinding jumps
there is almost no overhead added to the generated code.
The native emitter keeps the current exception in a slot in its C stack
(instead of on its Python value stack), so when it catches an exception it
must explicitly clear that slot so the same exception is not reraised later
on.
Back in 8047340d75 basic support was added in
the VM to handle return statements within a finally block. But it didn't
cover all cases, in particular when some finally's were active and others
inactive when the "return" was executed.
This patch adds further support for return-within-finally by correctly
managing the currently_in_except_block flag, and should fix all cases. The
main point is that finally handlers remain on the exception stack even if
they are active (currently being executed), and the unwind return code
should only execute those finally's which are inactive.
New tests are added for the cases which now pass.
Prior to this patch, native code would use a full nlr_buf_t for each
exception handler (try-except, try-finally, with). For nested exception
handlers this would use a lot of C stack and be rather inefficient.
This patch changes how exceptions are handled in native code by setting up
only a single nlr_buf_t context for the entire function, and then manages a
state machine (using the PC) to work out which exception handler to run
when an exception is raised by an nlr_jump. This keeps the C stack usage
at a constant level regardless of the depth of Python exception blocks.
The patch also fixes an existing bug when local variables are written to
within an exception handler, then their value was incorrectly restored if
an exception was raised (since the nlr_jump would restore register values,
back to the point of the nlr_push).
And it also gets nested try-finally+with working with the viper emitter.
Broadly speaking, efficiency of executing native code that doesn't use
any exception blocks is unchanged, and emitted code size is only slightly
increased for such function. C stack usage of all native functions is
either equal or less than before. Emitted code size for native functions
that use exception blocks is increased by roughly 10% (due in part to
fixing of above-mentioned bugs).
But, most importantly, this patch allows to implement more Python features
in native code, like unwind jumps and yielding from within nested exception
blocks.
These POSIX wrappers are assumed to be passed a concrete stream object so
it is more efficient (eg on nan-boxing builds) to pass in the pointer
rather than mp_obj_t, because then the users of these functions only need
to store a void* (and mp_obj_t may be wider than a pointer). And things
would be further improved if the stream protocol functions eventually took
a pointer as their first argument (instead of an mp_obj_t).
This patch is a step to getting ussl/axtls compiling on nan-boxing builds.
See issue #3085.
Otherwise there is the possibility that n_free starts out non-zero from the
previous iteration, which may have found a few (but not enough) free blocks
at the end of the heap. If this is the case, and if the very first blocks
that are scanned the second time around (starting at
gc_last_free_atb_index) are found to give enough memory (including the
blocks at the end of the heap from the previous iteration that left n_free
non-zero) then memory will be allocated starting before the location that
gc_last_free_atb_index points to, most likely leading to corruption.
This serious bug did not manifest itself in the past because a gc_collect
always resets gc_last_free_atb_index to point to the start of the GC heap,
and the first block there is almost always allocated to a long-lived
object (eg entries from sys.path, or mounted filesystem objects), which
means that n_free would be reset at the start of the search loop.
But with threading enabled with the GIL disabled it is possible to trigger
the bug via the following sequence of events:
1. Thread A runs gc_alloc, fails to find enough memory, and has a non-zero
n_free at the end of the search.
2. Thread A calls gc_collect and frees a bunch of blocks on the GC heap.
3. Just after gc_collect finishes in thread A, thread B takes gc_mutex and
does an allocation, moving gc_last_free_atb_index to point to the
interior of the heap, to a place where there is most likely a run of
available blocks.
4. Thread A regains gc_mutex and does its second search for free memory,
starting with a non-zero n_free. Since it's likely that the first block
it searches is available it will allocate memory which overlaps with the
memory before gc_last_free_atb_index.
Without this patch, on 64-bit architectures the "1 << (small_int_bits - 1)"
is computed using only 32-bit values (since small_int_bits is a uint8_t)
and so will overflow (and give the wrong result) if small_int_bits is
larger than 32.
There is no need to have three copies of the exception object on the top of
the native value stack. Instead, the values on the stack should be the
first two items in an nlr_buf_t: the prev pointer and the ret_val pointer.
This is all that is needed and is what the rest of the native emitter
expects is on the stack.
This patch is essentially an optimisation. Behaviour is unchanged,
although the stack layout for native exception handling now makes more
sense.
A native function allocates space on its C stack for mp_code_state_t,
followed by its Python stack, then its locals. This patch makes sure that
the native function actually starts at the start of its Python stack,
rather than at the start of mp_code_state_t (which didn't lead to any
issues so far because the mp_code_state_t is unused after the native
function sets itself up).
On x86 archs (both 32 and 64 bit) a bool return value only sets the 8-bit
al register, and the higher bits of the ax register have an undefined
value. When testing the return value of such cases it is required to just
test al for zero/non-zero. On the other hand, checking for truth or
zero/non-zero on an integer return value requires checking all bits of the
register. These two cases must be distinguished and handled correctly in
generated native code. This patch makes sure of this.
For other supported native archs (ARM, Thumb2, Xtensa) there is no such
distinction and this patch does not change anything for them.
DEBUG_printf and MICROPY_DEBUG_PRINTER is now used instead of normal
printf, and a fault is fixed in mp_obj_class_lookup with debugging enabled;
see issue #3999. Debugging can now be enabled on all ports including when
nan-boxing is used.
This patch in effect renames MICROPY_DEBUG_PRINTER_DEST to
MICROPY_DEBUG_PRINTER, moving its default definition from
lib/utils/printf.c to py/mpconfig.h to make it official and documented, and
makes this macro a pointer rather than the actual mp_print_t struct. This
is done to get consistency with MICROPY_ERROR_PRINTER, and provide this
macro for use outside just lib/utils/printf.c.
Ports are updated to use the new macro name.
This patch makes the Thumb-2 native emitter use wide ldr instructions to
call into the runtime, when the index into the native glue function table
is 32 or greater. This reduces the generated assembler code from 10 bytes
to 6 bytes, saving RAM and making native code run about 0.8% faster.
This error message did not consume all of its variable args, a bug
introduced long ago in baf6f14deb. By fixing
it to use %s (instead of keeping the string as-is and deleting the last
arg) the same error message string is now reused three times in this format
function and gives a code size reduction of around 130 bytes. It also now
gives a better error message when a non-string is passed in as an argument
to format, eg '{:d}'.format([]).
There's no need to call mp_obj_new_int() which will just fail the check for
small int and call mp_obj_new_int_from_ll() anyway.
Thanks to @Jongy for prompting this change.
In non-debug mode MP_OBJ_STOP_ITERATION is zero and comparing something to
zero can be done more efficiently in assembler than comparing to a non-zero
value.
With the recent change b488a4a848, a
generating function now has the same layout in memory as a normal bytecode
function, and so can reuse the latter's attribute accessor code to
implement __name__.
Because this function is simple it saves code size to have it inlined.
Being an auxiliary helper function (and only used in the py/ core) the
argument should always be an mp_obj_module_t*, so there's no need for the
assert (and having it would require including assert.h in obj.h).
It's a very simple function and saves code, and improves efficiency, by
being inline. Note that this is an auxiliary helper function and so
doesn't need mp_check_self -- that's used for functions that can be
accessed directly from Python code (eg from a method table).
mp_obj_module_get_globals() returns a mp_obj_dict_t*, and type->locals_dict
is a mp_obj_dict_t*, so access the map entry of the dict directly instead
of needing to cast this mp_obj_dict_t* up to an object and then calling the
mp_obj_dict_get_map() helper function.