This commit enables some significant optimisations for esp32:
- move the VM to iRAM
- move hot parts of the runtime to iRAM (map lookup, load global/name,
mp_obj_get_type)
- enable MICROPY_OPT_LOAD_ATTR_FAST_PATH
- enable MICROPY_OPT_MAP_LOOKUP_CACHE
- disable assertions
- change from -Os to -O2 for compilation
It's hard to measure performance on esp32 due to external flash and
hardware caching. But this set of changes improves performance compared to
master by (on a TinyPICO with the GENERIC build, using IDF 4.2.2, running
at 160MHz):
diff of scores (higher is better)
N=100 M=100 esp32-master -> esp32-perf diff diff% (error%)
bm_chaos.py 71.28 -> 268.08 : +196.80 = +276.094% (+/-0.04%)
bm_fannkuch.py 44.10 -> 69.31 : +25.21 = +57.166% (+/-0.01%)
bm_fft.py 1385.27 -> 2538.23 : +1152.96 = +83.230% (+/-0.01%)
bm_float.py 1060.94 -> 3900.62 : +2839.68 = +267.657% (+/-0.03%)
bm_hexiom.py 10.90 -> 32.79 : +21.89 = +200.826% (+/-0.02%)
bm_nqueens.py 1000.83 -> 2372.87 : +1372.04 = +137.090% (+/-0.01%)
bm_pidigits.py 288.13 -> 664.40 : +376.27 = +130.590% (+/-0.46%)
misc_aes.py 102.45 -> 345.69 : +243.24 = +237.423% (+/-0.01%)
misc_mandel.py 1016.58 -> 2121.92 : +1105.34 = +108.731% (+/-0.01%)
misc_pystone.py 632.91 -> 1801.87 : +1168.96 = +184.696% (+/-0.08%)
misc_raytrace.py 76.66 -> 281.78 : +205.12 = +267.571% (+/-0.05%)
viper_call0.py 210.63 -> 273.17 : +62.54 = +29.692% (+/-0.01%)
viper_call1a.py 208.45 -> 269.51 : +61.06 = +29.292% (+/-0.00%)
viper_call1b.py 185.44 -> 228.25 : +42.81 = +23.086% (+/-0.01%)
viper_call1c.py 185.86 -> 228.90 : +43.04 = +23.157% (+/-0.01%)
viper_call2a.py 207.10 -> 267.25 : +60.15 = +29.044% (+/-0.00%)
viper_call2b.py 173.76 -> 209.42 : +35.66 = +20.523% (+/-0.00%)
Five tests have more than 3x speed up (200%+).
The performance of the tests bm_fft, bm_pidigits and misc_aes now scale
with CPU frequency (eg changing frequency to 240MHz boosts the performance
of these by 50%), which means they are no longer influenced by timing of
external flash access. (The viper_call* tests did previously scale with
CPU frequency, and they still do.)
Turning off assertions reduces code size by about 80k, and going from -Os
to -O2 costs about 100k, so the net change in code size (for the GENERIC
board) is about +20k.
If a board wants to enable assertions, or use -Os instead of -O2, that's
still possible by overriding the sdkconfig parameters.
Signed-off-by: Damien George <damien@micropython.org>
Ensures consistent behaviour and resolves the D-Cache bug (the "exhaustive"
argument being lost due to cache being turned off) when O0 is used.
The changes in this commit are:
- Change -O0 to -Os because "gcc is considered broken at -O0" according to
https://github.com/ARM-software/CMSIS_5/issues/620#issuecomment-550235656
- Use volatile for mem_base so the compiler doesn't optimise away reads or
writes to the SDRAM, which is being tested.
- Use DSB to prevent any other compiler optimisations that would change the
testing logic.
- Use alternating pattern/antipattern in exhaustive test to catch more
hardware/configuration errors.
Implementation adapted by @andrewleech, taken directly from investigation
by @iabdalkader and @dpgeorge.
See #7841 and #7869 for further discussion.
To match network_lan.c and network_ppp.c, and make it clear what code is
specifically for WLAN support.
Also provide a configuration option MICROPY_PY_NETWORK_WLAN which can be
used to fully disable network.WLAN (it's enabled by default).
Signed-off-by: Damien George <damien@micropython.org>
To do this the board must define MICROPY_BOARD_STARTUP, set
MICROPY_SOURCE_BOARD then define the new start-up code.
For example, in mpconfigboard.h:
#define MICROPY_BOARD_STARTUP board_startup
void board_startup(void);
in mpconfigboard.cmake:
set(MICROPY_SOURCE_BOARD
${MICROPY_BOARD_DIR}/board.c
)
and in a new board.c file in the board directory:
#include "py/mpconfig.h"
void board_startup(void) {
boardctrl_startup();
// extra custom startup
}
This follows stm32's boardctrl facilities.
Signed-off-by: Damien George <damien@micropython.org>
Because vPortCleanUpTCB is called by the FreeRTOS idle task, and it checks
thread, but didn't check the thread_mutex.
And if thread is not NULL, but thread_mutex not ready then it will crash
with an error when calling mp_thread_mutex_lock(&thread_mutex, 1).
As suggested by @dpgeorge, move the thread = &thread_entry0 line to the end
of mp_thread_init().
Signed-off-by: leo chung <gewalalb@gmail.com>
This callback allows detecting if there is a USB host connected to the CDC
or not, in which case the stdout_tx should skip CDC TX writing and
flushing or the system will block.
Fixes issue #7820.
This commit allows using all the available PWM timers (up to 8) and
channels (up to 16), without affecting the PWM API.
If a new frequency is set, first it checks if another timer is using the
same frequency. If yes, then it uses this timer, otherwise, it creates a
new one. If all timers are used, the user should set an already used
frequency, or de-init a channel.
This work is based on #6276 and #3608.
The H743 has equal sized pages of 128k, which means the filesystem doesn't
need to be near the beginning. This commit moves the filesystem to the
very end of flash, and extends it to 512k (4 pages).
Signed-off-by: Damien George <damien@micropython.org>
This change adds the OLIMEX H407 support to the STM32 port. The H407
(https://www.olimex.com/Products/ARM/ST/STM32-H407/) is simliar to the
already existing E407
(https://www.olimex.com/Products/ARM/ST/STM32-E407) but does not support
Ethernet and has a full-size USB-A port instead of a Mini-USB socket.
Both boards use the STM32F407ZGT6 CPU.
This port is basically a copy of the E407 but with changed pinmux:
* Removed Ethernet pin definition
* Removed UART1 (pins are used for other functions)
* Removed UART3 flow control pins (pins are used for other functions)
* Removed SD-Card detect pin (since it is not connected on the H407)
A REPL on UART3 is connected to the U3BOOT-header, a 3-pin header with RX,
TX and GND that is intended for the serial terminal.
Tested:
* Micro-SD Card is detected when inserted on RESET
* REPL on UART3 works
* Serial port on the mini USB socket
Signed-off-by: Chris Fiege <cfi@pengutronix.de>
This commit removes all parts of code associated with the existing
MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE optimisation option, including the
-mcache-lookup-bc option to mpy-cross.
This feature originally provided a significant performance boost for Unix,
but wasn't able to be enabled for MCU targets (due to frozen bytecode), and
added significant extra complexity to generating and distributing .mpy
files.
The equivalent performance gain is now provided by the combination of
MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE (which has
been enabled on the unix port in the previous commit).
It's hard to provide precise performance numbers, but tests have been run
on a wide variety of architectures (x86-64, ARM Cortex, Aarch64, RISC-V,
xtensa) and they all generally agree on the qualitative improvements seen
by the combination of MICROPY_OPT_LOAD_ATTR_FAST_PATH and
MICROPY_OPT_MAP_LOOKUP_CACHE.
For example, on a "quiet" Linux x64 environment (i3-5010U @ 2.10GHz) the
change from CACHE_MAP_LOOKUP_IN_BYTECODE, to LOAD_ATTR_FAST_PATH combined
with MAP_LOOKUP_CACHE is:
diff of scores (higher is better)
N=2000 M=2000 bccache -> attrmapcache diff diff% (error%)
bm_chaos.py 13742.56 -> 13905.67 : +163.11 = +1.187% (+/-3.75%)
bm_fannkuch.py 60.13 -> 61.34 : +1.21 = +2.012% (+/-2.11%)
bm_fft.py 113083.20 -> 114793.68 : +1710.48 = +1.513% (+/-1.57%)
bm_float.py 256552.80 -> 243908.29 : -12644.51 = -4.929% (+/-1.90%)
bm_hexiom.py 521.93 -> 625.41 : +103.48 = +19.826% (+/-0.40%)
bm_nqueens.py 197544.25 -> 217713.12 : +20168.87 = +10.210% (+/-3.01%)
bm_pidigits.py 8072.98 -> 8198.75 : +125.77 = +1.558% (+/-3.22%)
misc_aes.py 17283.45 -> 16480.52 : -802.93 = -4.646% (+/-0.82%)
misc_mandel.py 99083.99 -> 128939.84 : +29855.85 = +30.132% (+/-5.88%)
misc_pystone.py 83860.10 -> 82592.56 : -1267.54 = -1.511% (+/-2.27%)
misc_raytrace.py 21490.40 -> 22227.23 : +736.83 = +3.429% (+/-1.88%)
This shows that the new optimisations are at least as good as the existing
inline-bytecode-caching, and are sometimes much better (because the new
ones apply caching to a wider variety of map lookups).
The new optimisations can also benefit code generated by the native
emitter, because they apply to the runtime rather than the generated code.
The improvement for the native emitter when LOAD_ATTR_FAST_PATH and
MAP_LOOKUP_CACHE are enabled is (same Linux environment as above):
diff of scores (higher is better)
N=2000 M=2000 native -> nat-attrmapcache diff diff% (error%)
bm_chaos.py 14130.62 -> 15464.68 : +1334.06 = +9.441% (+/-7.11%)
bm_fannkuch.py 74.96 -> 76.16 : +1.20 = +1.601% (+/-1.80%)
bm_fft.py 166682.99 -> 168221.86 : +1538.87 = +0.923% (+/-4.20%)
bm_float.py 233415.23 -> 265524.90 : +32109.67 = +13.756% (+/-2.57%)
bm_hexiom.py 628.59 -> 734.17 : +105.58 = +16.796% (+/-1.39%)
bm_nqueens.py 225418.44 -> 232926.45 : +7508.01 = +3.331% (+/-3.10%)
bm_pidigits.py 6322.00 -> 6379.52 : +57.52 = +0.910% (+/-5.62%)
misc_aes.py 20670.10 -> 27223.18 : +6553.08 = +31.703% (+/-1.56%)
misc_mandel.py 138221.11 -> 152014.01 : +13792.90 = +9.979% (+/-2.46%)
misc_pystone.py 85032.14 -> 105681.44 : +20649.30 = +24.284% (+/-2.25%)
misc_raytrace.py 19800.01 -> 23350.73 : +3550.72 = +17.933% (+/-2.79%)
In summary, compared to MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE, the new
MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE options:
- are simpler;
- take less code size;
- are faster (generally);
- work with code generated by the native emitter;
- can be used on embedded targets with a small and constant RAM overhead;
- allow the same .mpy bytecode to run on all targets.
See #7680 for further discussion. And see also #7653 for a discussion
about simplifying mpy-cross options.
Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
Update minimal port to use the new "minimal" rom level config (this is a
no-op change, the binary is the same size and contains the exact same
symbols).
Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
Any external user of DMA (eg a board with a custom DMA driver) must call
dma_external_acquire() for their DMA controller/stream to ensure that the
DMA clock is not automatically turned off while it's still being used
externally.
Signed-off-by: Damien George <damien@micropython.org>
This change allows a CPU pin to be hidden from the user by prefixing it
with a "-" in the pins.csv file for a board. It will still be available in
C code, just not exposed to Python.
Signed-off-by: Damien George <damien@micropython.org>
Don't want users to accidentally use boot.py (because recovering requires
knowing how to activate safe mode).
Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This commit is based upon prior work of @dpgeorge and @koendv.
MCU support for the STM32H7A3 and B3 families MCUs:
- STM32H7A3xx
- STM32H7A3xxQ (SMPS)
- STM32H7B3xx
- STM32H7B3xxQ (SMPS)
Support has been added for the STM32H7B3I_DK board.
Signed-off-by: Jan Staal <info@janstaal.com>
To simplify the socket state.
The CC3K driver (see drivers/cc3000/inc/socket.h and src/socket.c) has
socket() returning an INT16 so there is now enough room to store it
directly in the fileno member.
- Moves definition of BOARD_FLASH_SIZE and other header files related to
flash configuration into the Makefile.
- Adds board specific clock_config.h.
- Adds board.h, pin_mux.h, and peripherals.h as they are
required by NXP MCU SDK in order to use our own clock_config.h.
- Renames board specific FlexSPI configuration files.
- Updates flash frequency of MIMXRT1020_EVK
- Creates separated flash_config files for QSPI NOR and
QSPI Hyper flash.
- Unifies VFS start address to be @ 1M for 1010 and 1020 boards.
- Unifies 1050EVK boards
- Adds support to both NOR and HyperFlash on boards with
both capabilities.
- Adds automatic FlexRAM initialization to start-up code based on
linker script and NXP HAL.
- Applies code formatting to all files in mimxrt port.
With this change the flash configuration is restructured and
organized. This simplifies the configuration process and
provides a better overview of each board's settings. With the integration
of clock_config.h, board.h, pin_mux.h, and peripherals.h we gain better
control of the settings and clock configurations. Furthermore the
implementation of an explicit FlexRAM setup improves the system
performance and allows for performance tuning.
Signed-off-by: Philipp Ebensberger
It's not possible anymore to build MicroPython on Cygwin using a
standard Windows installation of Python so don't advertise that.
Specifically: preprocessing in makeqstrdefs.py fails on the subprocess
call with 'gcc: fatal error: no input files' because one of the flags
contains double quotes and that somehow messes up the commandline.
Following the code example for ESP32 of Jim Mussard.
As a side effect:
- mp_hal_ticks_cpu() was implemented,
- mp_hal_get_cpu_freq() and mp_hal_ticks_cpu_init() were added and used.
- mp_hal_pin_high() and mp_hal_pin_low() were changed for symmetry
- Configures `PLL2->PFD0` with **198MHz** as base clock of
`USDHCx` peripheral.
- Adds guards for SDCard related files via `MICROPY_PY_MACHINE_SDCARD`
- Adds creation of pin defines for SDCard to make-pins.py
- Adds new configuration option for SDCard peripheral pinout
to mpconfigport.h
- Adds interrupt handling support instead of polling
- Adds support for `ADMA2` powered data transfer
- Configures SDCard to run in HS (high-speed mode) with **50MHz** only!
SDCard support is optional and requires `USDHC` peripheral.
Thus this driver is not available on `MIMXRT1010_EVK`.
SDCard support is enabled by setting `MICROPY_PY_MACHINE_SDCARD = 1`
in mpconfigboard.mk.
Signed-off-by: Philipp Ebensberger
This commit refactors machine.PWM and creates extmod/machine_pwm.c. The
esp8266, esp32 and rp2 ports all use this and provide implementations of
the required PWM functionality. This helps to reduce code duplication and
keep the same Python API across ports.
This commit does not make any functional changes.
Signed-off-by: Damien George <damien@micropython.org>