This allows calls to `allocate_memory()` while the VM is running, it will then allocate from the GC heap (unless there is a suitable hole among the supervisor allocations), and when the VM exits and the GC heap is freed, the allocation will be moved to the bottom of the former GC heap and transformed into a proper supervisor allocation. Existing movable allocations will also be moved to defragment the supervisor heap and ensure that the next VM run gets as much memory as possible for the GC heap.
By itself this breaks terminalio because it violates the assumption that supervisor_display_move_memory() still has access to an undisturbed heap to copy the tilegrid from. It will work in many cases, but if you're unlucky you will get garbled terminal contents after exiting from the vm run that created the display. This will be fixed in the following commit, which is separate to simplify review.
`pow(a, b, c)` can compute `(a ** b) % c` efficiently (in time and memory).
This can be useful for extremely specific applications, like implementing
the RSA cryptosystem. For typical uses of CircuitPython, this is not an
important feature. A survey of the bundle and learn system didn't find
any uses.
Disable it on M0 builds so that we can fit in needed upgrades to the USB
stack.
Disable certain classes of diagnostic when building ulab. We should
submit patches upstream to (A) fix these errors and (B) upgrade their
CI so that the problems are caught before we want to integrate with
CircuitPython, but not right now.
I like to use local makefile overrides, in the file GNUmakefile
(or, on case-sensitive systems, makefile) to set compilation choices.
However, writing
TRANSLATION := de_DE
include Makefile
did not work, because py.mk would override the TRANSLATION := specified
in an earlier part of the makefiles (but not from the commandline).
By using ?= instead of := the local makefile override works, but when
TRANSLATION is not specified it continues to work as before.
This ensures that only the translate("") alternative that will be used
is seen after preprocessing. Improves the quality of the Huffman encoding
and reduces binary size slightly.
Also makes one "enhanced" error message only occur when ERROR_REPORTING_DETAILED:
Instead of the word-for-word python3 error message
"Type object has no attribute '%q'", the message will be
"'type' object has no attribute '%q'". Also reduces binary size.
(that's rolled into this commit as it was right next to a change to
use the preprocessor for MICROPY_ERROR_REPORTING)
Note that the odd semicolon after "value_error:" in parsenum.c is necessary
due to a detail of the C grammar, in which a declaration cannot follow
a label directly.
This reclaims over 1kB of flash space by simplifying certain exception
messages. e.g., it will no longer display the requested/actual length
when a fixed list/tuple of N items is needed:
if (MICROPY_ERROR_REPORTING == MICROPY_ERROR_REPORTING_TERSE) {
mp_raise_ValueError(translate("tuple/list has wrong length"));
} else {
mp_raise_ValueError_varg(translate("requested length %d but object has length %d"),
(int)len, (int)seq_len);
Other chip families including samd51 keep their current error reporting
capabilities.
* No weak link for modules. It only impacts _os and _time and is
already disabled for non-full builds.
* Turn off PA00 and PA01 because they are the crystal on the Metro
M0 Express.
* Change ejected default to false to move it to BSS. It is set on
USB connection anyway.
* Set sinc_filter to const. Doesn't help flash but keeps it out of
RAM.
This gets a further speedup of about 2s (12s -> 9.5s elapsed build time)
for stm32f405_feather
For what are probably historical reasons, the qstr process involves
preprocessing a large number of source files into a single "qstr.i.last"
file, then reading this and splitting it into one "qstr" file for each
original source ("*.c") file.
By eliminating the step of writing qstr.i.last as well as making the
regular-expression-matching part be parallelized, build speed is further
improved.
Because the step to build QSTR_DEFS_COLLECTED does not access
qstr.i.last, the path is replaced with "-" in the Makefile.
Rather than simply invoking gcc in preprocessor mode with a list of files, use
a Python script with the (python3) ThreadPoolExecutor to invoke the
preprocessor in parallel.
The amount of concurrency is the number of system CPUs, not the makefile "-j"
parallelism setting, because there is no simple and correct way for a Python
program to correctly work together with make's idea of parallelism.
This reduces the build time of stm32f405 feather (a non-LTO build) from 16s to
12s on my 16-thread Ryzen machine.