This should be enabled when the mp_raw_code_save_file function is needed.
It is enabled for mpy-cross, and a check for defined(__APPLE__) is added to
cover Mac M1 systems.
On ports where normal heap memory can contain executable code (eg ARM-based
ports such as stm32), native code loaded from an .mpy file may be reclaimed
by the GC because there's no reference to the very start of the native
machine code block that is reachable from root pointers (only pointers to
internal parts of the machine code block are reachable, but that doesn't
help the GC find the memory).
This commit fixes this issue by maintaining an explicit list of root
pointers pointing to native code that is loaded from an .mpy file. This
is not needed for all ports so is selectable by the new configuration
option MICROPY_PERSISTENT_CODE_TRACK_RELOC_CODE. It's enabled by default
if a port does not specify any special functions to allocate or commit
executable memory.
A test is included to test that native code loaded from an .mpy file does
not get reclaimed by the GC.
Fixes#6045.
Signed-off-by: Damien George <damien@micropython.org>
This makes the loading of viper-code-with-relocations a bit neater and
easier to understand, by treating the rodata/bss like a special object to
be loaded into the constant table (which is how it behaves).
Implements text, rodata and bss generalised relocations, as well as generic
qstr-object linking. This allows importing dynamic native modules on all
supported architectures in a unified way.
Instead of encoding 4 zero bytes as placeholders for the simple_name and
source_file qstrs, and storing the qstrs after the bytecode, store the
qstrs at the location of these 4 bytes. This saves 4 bytes per bytecode
function stored in a .mpy file (for example lcd160cr.mpy drops by 232
bytes, 4x 58 functions). And resulting code size is slightly reduced on
ports that use this feature.
This patch compresses the second part of the bytecode prelude which
contains the source file name, function name, source-line-number mapping
and cell closure information. This part of the prelude now begins with a
single varible length unsigned integer which encodes 2 numbers, being the
byte-size of the following 2 sections in the header: the "source info
section" and the "closure section". After decoding this variable unsigned
integer it's possible to skip over one or both of these sections very
easily.
This scheme saves about 2 bytes for most functions compared to the original
format: one in the case that there are no closure cells, and one because
padding was eliminated.
The start of the bytecode prelude contains 6 numbers telling the amount of
stack needed for the Python values and exceptions, and the signature of the
function. Prior to this patch these numbers were all encoded one after the
other (2x variable unsigned integers, then 4x bytes), but using so many
bytes is unnecessary.
An entropy analysis of around 150,000 bytecode functions from the CPython
standard library showed that the optimal Shannon coding would need about
7.1 bits on average to encode these 6 numbers, compared to the existing 48
bits.
This patch attempts to get close to this optimal value by packing the 6
numbers into a single, varible-length unsigned integer via bit-wise
interleaving. The interleaving scheme is chosen to minimise the average
number of bytes needed, and at the same time keep the scheme simple enough
so it can be implemented without too much overhead in code size or speed.
The scheme requires about 10.5 bits on average to store the 6 numbers.
As a result most functions which originally took 6 bytes to encode these 6
numbers now need only 1 byte (in 80% of cases).
With both MICROPY_PERSISTENT_CODE_SAVE and MICROPY_PERSISTENT_CODE_LOAD
enabled the code fails to compile, due to undeclared 'n_obj'. If
MICROPY_EMIT_NATIVE is disabled there are more errors due to the use of
undefined fields in mp_raw_code_t.
This patch fixes such compilation by avoiding undefined fields.
MICROPY_EMIT_NATIVE was changed to MICROPY_EMIT_MACHINE_CODE in this file
to match the mp_raw_code_t definition.
This commit adds support for saving and loading .mpy files that contain
native code (native, viper and inline-asm). A lot of the ground work was
already done for this in the form of removing pointers from generated
native code. The changes here are mainly to link in qstr values to the
native code, and change the format of .mpy files to contain native code
blocks (possibly mixed with bytecode).
A top-level summary:
- @micropython.native, @micropython.viper and @micropython.asm_thumb/
asm_xtensa are now allowed in .py files when compiling to .mpy, and they
work transparently to the user.
- Entire .py files can be compiled to native via mpy-cross -X emit=native
and for the most part the generated .mpy files should work the same as
their bytecode version.
- The .mpy file format is changed to 1) specify in the header if the file
contains native code and if so the architecture (eg x86, ARMV7M, Xtensa);
2) for each function block the kind of code is specified (bytecode,
native, viper, asm).
- When native code is loaded from a .mpy file the native code must be
modified (in place) to link qstr values in, just like bytecode (see
py/persistentcode.c:arch_link_qstr() function).
In addition, this now defines a public, native ABI for dynamically loadable
native code generated by other languages, like C.
When encoded in the mpy file, if qstr <= QSTR_LAST_STATIC then store two
bytes: 0, static_qstr_id. Otherwise encode the qstr as usual (either with
string data or a reference into the qstr window).
Reduces mpy file size by about 5%.
Instead of emitting two bytes in the bytecode for where the linked qstr
should be written to, it is now replaced by the actual qstr data, or a
reference into the qstr window.
Reduces mpy file size by about 10%.
This is an implementation of a sliding qstr window used to reduce the
number of qstrs stored in a .mpy file. The window size is configured to 32
entries which takes a fixed 64 bytes (16-bits each) on the C stack when
loading/saving a .mpy file. It allows to remember the most recent 32 qstrs
so they don't need to be stored again in the .mpy file. The qstr window
uses a simple least-recently-used mechanism to discard the least recently
used qstr when the window overflows (similar to dictionary compression).
This scheme only needs a single pass to save/load the .mpy file.
Reduces mpy file size by about 25% with a window size of 32.
These macros could in principle be (inline) functions so it makes sense to
have them lower case, to match the other C API functions.
The remaining macros that are upper case are:
- MP_OBJ_TO_PTR, MP_OBJ_FROM_PTR
- MP_OBJ_NEW_SMALL_INT, MP_OBJ_SMALL_INT_VALUE
- MP_OBJ_NEW_QSTR, MP_OBJ_QSTR_VALUE
- MP_OBJ_FUN_MAKE_SIG
- MP_DECLARE_CONST_xxx
- MP_DEFINE_CONST_xxx
These must remain macros because they are used when defining const data (at
least, MP_OBJ_NEW_SMALL_INT is so it makes sense to have
MP_OBJ_SMALL_INT_VALUE also a macro).
For those macros that have been made lower case, compatibility macros are
provided for the old names so that users do not need to change their code
immediately.
If MICROPY_PERSISTENT_CODE_LOAD or MICROPY_ENABLE_COMPILER are enabled then
code gets enabled that calls file reading functions which may be disabled
if no readers have been implemented.
To fix this, introduce a MICROPY_HAS_FILE_READER variable, which is
automatically set if MICROPY_READER_POSIX or MICROPY_READER_VFS is set but
can also be manually set if a custom reader is being implemented. Then
disable the file reading calls if this is not set.
Header files that are considered internal to the py core and should not
normally be included directly are:
py/nlr.h - internal nlr configuration and declarations
py/bc0.h - contains bytecode macro definitions
py/runtime0.h - contains basic runtime enums
Instead, the top-level header files to include are one of:
py/obj.h - includes runtime0.h and defines everything to use the
mp_obj_t type
py/runtime.h - includes mpstate.h and hence nlr.h, obj.h, runtime0.h,
and defines everything to use the general runtime support functions
Additional, specific headers (eg py/objlist.h) can be included if needed.
This patch allows the following code to run without allocating on the heap:
super().foo(...)
Before this patch such a call would allocate a super object on the heap and
then load the foo method and call it right away. The super object is only
needed to perform the lookup of the method and not needed after that. This
patch makes an optimisation to allocate the super object on the C stack and
discard it right after use.
Changes in code size due to this patch are:
bare-arm: +128
minimal: +232
unix x64: +416
unix nanbox: +364
stmhal: +184
esp8266: +340
cc3200: +128
This patch refactors the error handling in the lexer, to simplify it (ie
reduce code size).
A long time ago, when the lexer/parser/compiler were first written, the
lexer and parser were designed so they didn't use exceptions (ie nlr) to
report errors but rather returned an error code. Over time that has
gradually changed, the parser in particular has more and more ways of
raising exceptions. Also, the lexer never really handled all errors without
raising, eg there were some memory errors which could raise an exception
(and in these rare cases one would get a fatal nlr-not-handled fault).
This patch accepts the fact that the lexer can raise exceptions in some
cases and allows it to raise exceptions to handle all its errors, which are
for the most part just out-of-memory errors during construction of the
lexer. This makes the lexer a bit simpler, and also the persistent code
stuff is simplified.
What this means for users of the lexer is that calls to it must be wrapped
in a nlr handler. But all uses of the lexer already have such an nlr
handler for the parser (and compiler) so that doesn't put any extra burden
on the callers.
Implementations of persistent-code reader are provided for POSIX systems
and systems using FatFS. Macros to use these are MICROPY_READER_POSIX and
MICROPY_READER_FATFS respectively. If an alternative implementation is
needed then a port can define the function mp_reader_new_file.