Background: .mpy files are precompiled .py files, built using mpy-cross,
that contain compiled bytecode functions (and can also contain machine
code). The benefit of using an .mpy file over a .py file is that they are
faster to import and take less memory when importing. They are also
smaller on disk.
But the real benefit of .mpy files comes when they are frozen into the
firmware. This is done by loading the .mpy file during compilation of the
firmware and turning it into a set of big C data structures (the job of
mpy-tool.py), which are then compiled and downloaded into the ROM of a
device. These C data structures can be executed in-place, ie directly from
ROM. This makes importing even faster because there is very little to do,
and also means such frozen modules take up much less RAM (because their
bytecode stays in ROM).
The downside of frozen code is that it requires recompiling and reflashing
the entire firmware. This can be a big barrier to entry, slows down
development time, and makes it harder to do OTA updates of frozen code
(because the whole firmware must be updated).
This commit attempts to solve this problem by providing a solution that
sits between loading .mpy files into RAM and freezing them into the
firmware. The .mpy file format has been reworked so that it consists of
data and bytecode which is mostly static and ready to run in-place. If
these new .mpy files are located in flash/ROM which is memory addressable,
the .mpy file can be executed (mostly) in-place.
With this approach there is still a small amount of unpacking and linking
of the .mpy file that needs to be done when it's imported, but it's still
much better than loading an .mpy from disk into RAM (although not as good
as freezing .mpy files into the firmware).
The main trick to make static .mpy files is to adjust the bytecode so any
qstrs that it references now go through a lookup table to convert from
local qstr number in the module to global qstr number in the firmware.
That means the bytecode does not need linking/rewriting of qstrs when it's
loaded. Instead only a small qstr table needs to be built (and put in RAM)
at import time. This means the bytecode itself is static/constant and can
be used directly if it's in addressable memory. Also the qstr string data
in the .mpy file, and some constant object data, can be used directly.
Note that the qstr table is global to the module (ie not per function).
In more detail, in the VM what used to be (schematically):
qst = DECODE_QSTR_VALUE;
is now (schematically):
idx = DECODE_QSTR_INDEX;
qst = qstr_table[idx];
That allows the bytecode to be fixed at compile time and not need
relinking/rewriting of the qstr values. Only qstr_table needs to be linked
when the .mpy is loaded.
Incidentally, this helps to reduce the size of bytecode because what used
to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices.
If the module uses the same qstr more than two times then the bytecode is
smaller than before.
The following changes are measured for this commit compared to the
previous (the baseline):
- average 7%-9% reduction in size of .mpy files
- frozen code size is reduced by about 5%-7%
- importing .py files uses about 5% less RAM in total
- importing .mpy files uses about 4% less RAM in total
- importing .py and .mpy files takes about the same time as before
The qstr indirection in the bytecode has only a small impact on VM
performance. For stm32 on PYBv1.0 the performance change of this commit
is:
diff of scores (higher is better)
N=100 M=100 baseline -> this-commit diff diff% (error%)
bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%)
bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%)
bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%)
bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%)
bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%)
bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%)
bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%)
core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%)
core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%)
core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%)
core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%)
misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%)
misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%)
misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%)
misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%)
viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%)
viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%)
viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%)
viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%)
viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%)
viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%)
And for unix on x64:
diff of scores (higher is better)
N=2000 M=2000 baseline -> this-commit diff diff% (error%)
bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%)
bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%)
bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%)
bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%)
bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%)
bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%)
bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%)
misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%)
misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%)
misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%)
misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%)
The code size change is (firmware with a lot of frozen code benefits the
most):
bare-arm: +396 +0.697%
minimal x86: +1595 +0.979% [incl +32(data)]
unix x64: +2408 +0.470% [incl +800(data)]
unix nanbox: +1396 +0.309% [incl -96(data)]
stm32: -1256 -0.318% PYBV10
cc3200: +288 +0.157%
esp8266: -260 -0.037% GENERIC
esp32: -216 -0.014% GENERIC[incl -1072(data)]
nrf: +116 +0.067% pca10040
rp2: -664 -0.135% PICO
samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS
As part of this change the .mpy file format version is bumped to version 6.
And mpy-tool.py has been improved to provide a good visualisation of the
contents of .mpy files.
In summary: this commit changes the bytecode to use qstr indirection, and
reworks the .mpy file format to be simpler and allow .mpy files to be
executed in-place. Performance is not impacted too much. Eventually it
will be possible to store such .mpy files in a linear, read-only, memory-
mappable filesystem so they can be executed from flash/ROM. This will
essentially be able to replace frozen code for most applications.
Signed-off-by: Damien George <damien@micropython.org>
The unix port's main.c gets used by unix and windows ports, and with a
variety of compilers, so it's convenient to see which version is actually
being used immediately when starting micropython. This is similar to what
CPython does.
If MBOOT_BOARD_ENTRY_INIT is defined by a board then that function must now
make sure system clocks are configured, eg by calling mboot_entry_init().
Signed-off-by: Damien George <damien@micropython.org>
If a board wants to customise the clocks it can define the following:
MBOOT_CLK_PLLM
MBOOT_CLK_PLLN
MBOOT_CLK_PLLP
MBOOT_CLK_PLLQ
MBOOT_CLK_PLLR (only needed on STM32H7)
MBOOT_FLASH_LATENCY
MBOOT_CLK_AHB_DIV
MBOOT_CLK_APB1_DIV
MBOOT_CLK_APB2_DIV
MBOOT_CLK_APB3_DIV (only needed on STM32H7)
MBOOT_CLK_APB4_DIV (only needed on STM32H7)
Signed-off-by: Damien George <damien@micropython.org>
In the `after_test` section, the current directory is `ports/windows` when
tests are run, so running `run-tests.py` without changing the directory or
specifying a path causes a file not found error.
This commit fixes the problem by changing the directory before calling
`run-tests.py`.
Signed-off-by: David Lechner <david@pybricks.com>
The UART hardware flow control was not working correctly, the receive FIFO
was always fetched and RTS was never deasserted. This is not a problem
when hardware flow control is not used: normally, if the receive FIFO is
full, the UART receiver won't receive data into the FIFO anymore, but the
current implementation fetches from the FIFO and discards it instead.
The problem is that data is discarded even when RTS is enabled.
This commit fixes the issue by only taking from the FIFO if there is room
in the ring buffer to put the character.
Signed-off-by: YoungJoon Chun <yjchun@mac.com>
Prior to this fix, if the ADC atten value was not explicitly given then
adc1_config_channel_atten() would never be called.
Fixes issue #8275.
Signed-off-by: Damien George <damien@micropython.org>
The correct day-of-week is stored in the RTC (0=Monday, 6=Sunday) so there
is no need to adjust it for the return value of time.localtime().
Fixes issue #7889.
Signed-off-by: Damien George <damien@micropython.org>
The default stm32lib remains lib/stm32lib, but it can now be easily
overriden at build time by specifying STM32LIB_DIR, or STM32LIB_CMSIS_DIR
and STM32LIB_HAL_DIR.
Signed-off-by: Damien George <damien@micropython.org>
This also fixes a possible race condition when exiting initialisation mode:
reading then writing to ISR (via ISR &= ~RTC_ISR_INIT) will clear any flags
that were set by the hardware between the read and the write. The correct
way to clear just the INIT bit is to just do a single write via ISR =
~RTC_ISR_INIT, which will not clear any other flags (they must be written
to 0 to clear), and that is exactly what LL_RTC_DisableInitMode does.
Signed-off-by: Damien George <damien@micropython.org>
And don't assert on the sector number in sector_erase, so it can support
erasing arbitrary sectors.
Signed-off-by: Damien George <damien@micropython.org>
The inclusion of `umachine` in the list of built-in modules is now done
centrally in py/objmodule.c. Enabling MICROPY_PY_MACHINE will include this
module.
As part of this, all ports now have `umachine` as the core module name
(previously some had only `machine` as the name).
Signed-off-by: Damien George <damien@micropython.org>
This change allows the same heap allocation rules to be used when using
malloc regardless if the board has SPRAM or normal RAM.
Integrating with the esp32-camera for example requires that ESP32 SPRAM be
allocatable using the esp-idf capabilities aware allocation functions. In
the case of esp32-camera it's for the framebuffer.
Detect when CONFIG_SPIRAM_USE_MALLOC is in use and use the standard
automatic configuration of leaving 1/2 of the SPRAM available to other
FreeRTOS tasks.
For example the ESP32-C3 has 2 TX channels and 2 RX channels in total, and
in this case channel 1 must be the default for bitstream.
Signed-off-by: Damien George <damien@micropython.org>
This commit adds support for the STM32G4 series of MCUs, and a board
definition for NUCLEO_G474RE. This board has the REPL on LPUART1 which is
connected to the on-board ST-link USB-UART.
It's needed at least on F4 because this file overrides the weak function
HAL_RCC_DeInit() from hal_rcc.c.
Signed-off-by: Damien George <damien@micropython.org>
PLL3-Q is more reliable than PLL1-Q for the USB clock source when entering
mboot from various reset states (eg power on vs MCU reset). (It was found
that if the main application used PLL3-Q then sometimes the USB clock
source would stay stuck on PLL3-Q and not switch to PLL1-Q after a reset.)
Other related changes:
- SystemCoreClockUpdate() should be called on H7 because the calculation
can be involved in some cases.
- __set_PRIMASK(0) should be called because on H7 the built-in ST DFU
bootloader exits with IRQs disabled.
Signed-off-by: Damien George <damien@micropython.org>
H7 MCUs have ECC and writes do not go through to SRAM until 64-bits have
been written (on another location is written). So use 64-bit writes for
the bootloader-state variable so it is committed before the system reset.
As part of this change, the lower byte of the bootloader address in
BL_STATE must now be the magic number 0x5a5 for the state to be valid
(previously this was 0x000 which is not as robust).
Signed-off-by: Damien George <damien@micropython.org>