Commit Graph

12740 Commits

Author SHA1 Message Date
Damien George
549448e8bb esp32: Enable optimisations and move code to iRAM to boost performance.
This commit enables some significant optimisations for esp32:
- move the VM to iRAM
- move hot parts of the runtime to iRAM (map lookup, load global/name,
  mp_obj_get_type)
- enable MICROPY_OPT_LOAD_ATTR_FAST_PATH
- enable MICROPY_OPT_MAP_LOOKUP_CACHE
- disable assertions
- change from -Os to -O2 for compilation

It's hard to measure performance on esp32 due to external flash and
hardware caching.  But this set of changes improves performance compared to
master by (on a TinyPICO with the GENERIC build, using IDF 4.2.2, running
at 160MHz):

diff of scores (higher is better)
N=100 M=100    esp32-master -> esp32-perf       diff      diff% (error%)
bm_chaos.py           71.28 ->     268.08 :  +196.80 = +276.094% (+/-0.04%)
bm_fannkuch.py        44.10 ->      69.31 :   +25.21 = +57.166% (+/-0.01%)
bm_fft.py           1385.27 ->    2538.23 : +1152.96 = +83.230% (+/-0.01%)
bm_float.py         1060.94 ->    3900.62 : +2839.68 = +267.657% (+/-0.03%)
bm_hexiom.py          10.90 ->      32.79 :   +21.89 = +200.826% (+/-0.02%)
bm_nqueens.py       1000.83 ->    2372.87 : +1372.04 = +137.090% (+/-0.01%)
bm_pidigits.py       288.13 ->     664.40 :  +376.27 = +130.590% (+/-0.46%)
misc_aes.py          102.45 ->     345.69 :  +243.24 = +237.423% (+/-0.01%)
misc_mandel.py      1016.58 ->    2121.92 : +1105.34 = +108.731% (+/-0.01%)
misc_pystone.py      632.91 ->    1801.87 : +1168.96 = +184.696% (+/-0.08%)
misc_raytrace.py      76.66 ->     281.78 :  +205.12 = +267.571% (+/-0.05%)
viper_call0.py       210.63 ->     273.17 :   +62.54 = +29.692% (+/-0.01%)
viper_call1a.py      208.45 ->     269.51 :   +61.06 = +29.292% (+/-0.00%)
viper_call1b.py      185.44 ->     228.25 :   +42.81 = +23.086% (+/-0.01%)
viper_call1c.py      185.86 ->     228.90 :   +43.04 = +23.157% (+/-0.01%)
viper_call2a.py      207.10 ->     267.25 :   +60.15 = +29.044% (+/-0.00%)
viper_call2b.py      173.76 ->     209.42 :   +35.66 = +20.523% (+/-0.00%)

Five tests have more than 3x speed up (200%+).

The performance of the tests bm_fft, bm_pidigits and misc_aes now scale
with CPU frequency (eg changing frequency to 240MHz boosts the performance
of these by 50%), which means they are no longer influenced by timing of
external flash access.  (The viper_call* tests did previously scale with
CPU frequency, and they still do.)

Turning off assertions reduces code size by about 80k, and going from -Os
to -O2 costs about 100k, so the net change in code size (for the GENERIC
board) is about +20k.

If a board wants to enable assertions, or use -Os instead of -O2, that's
still possible by overriding the sdkconfig parameters.

Signed-off-by: Damien George <damien@micropython.org>
2021-10-16 00:07:11 +11:00
Damien George
8412568e7b py: Add wrapper macros so hot VM functions can go in fast code location.
For example, on esp32 they can go in iRAM to improve performance.

Signed-off-by: Damien George <damien@micropython.org>
2021-10-15 23:31:19 +11:00
iabdalkader
eea6cd85b3 stm32/sdram: Enforce gcc opt, and use volatile and DSB in sdram_test.
Ensures consistent behaviour and resolves the D-Cache bug (the "exhaustive"
argument being lost due to cache being turned off) when O0 is used.

The changes in this commit are:

- Change -O0 to -Os because "gcc is considered broken at -O0" according to
  https://github.com/ARM-software/CMSIS_5/issues/620#issuecomment-550235656

- Use volatile for mem_base so the compiler doesn't optimise away reads or
  writes to the SDRAM, which is being tested.

- Use DSB to prevent any other compiler optimisations that would change the
  testing logic.

- Use alternating pattern/antipattern in exhaustive test to catch more
  hardware/configuration errors.

Implementation adapted by @andrewleech, taken directly from investigation
by @iabdalkader and @dpgeorge.

See #7841 and #7869 for further discussion.
2021-10-15 17:59:31 +11:00
NitiKaur
4c9e17e0a1 docs/esp32/tutorial: Add an example of peripheral control via regs. 2021-10-14 23:31:45 +11:00
NitiKaur
763042a287 docs/library/stm.rst: Document the stm module. 2021-10-14 23:19:08 +11:00
NitiKaur
e87b2e8bfa docs/reference/manifest.rst: Add docs for manifest.py files. 2021-10-14 14:03:03 +11:00
NitiKaur
135339ce3a docs/reference/mpremote.rst: Add docs for mpremote. 2021-10-14 13:09:51 +11:00
NitiKaur
c42c1c8718 docs/library/random.rst: Document the random module. 2021-10-13 16:56:37 +11:00
NitiKaur
baa5a76fc0 docs/rp2: Add reference for PIO assembly instructions, and PIO tutorial. 2021-10-13 15:54:49 +11:00
stijn
d42cba0d22 extmod/moduplatform: Improve implementation for PC ports.
Fix identification of 32/64 bit and of the Windows platform and add a
platform string mimicking CPython for the latter.
2021-09-24 13:51:39 +10:00
stijn
ea880d5674 py/builtinimport: Forward all debug printing to MICROPY_DEBUG_PRINTER. 2021-09-24 13:17:19 +10:00
Damien George
ea186de4c5 esp32: Split out WLAN code from modnetwork.c to network_wlan.c.
To match network_lan.c and network_ppp.c, and make it clear what code is
specifically for WLAN support.

Also provide a configuration option MICROPY_PY_NETWORK_WLAN which can be
used to fully disable network.WLAN (it's enabled by default).

Signed-off-by: Damien George <damien@micropython.org>
2021-09-24 12:41:35 +10:00
Damien George
f046b50ca5 esp32/main: Add option for a board to hook code into startup sequence.
To do this the board must define MICROPY_BOARD_STARTUP, set
MICROPY_SOURCE_BOARD then define the new start-up code.

For example, in mpconfigboard.h:

    #define MICROPY_BOARD_STARTUP board_startup
    void board_startup(void);

in mpconfigboard.cmake:

    set(MICROPY_SOURCE_BOARD
        ${MICROPY_BOARD_DIR}/board.c
    )

and in a new board.c file in the board directory:

    #include "py/mpconfig.h"

    void board_startup(void) {
        boardctrl_startup();
        // extra custom startup
    }

This follows stm32's boardctrl facilities.

Signed-off-by: Damien George <damien@micropython.org>
2021-09-24 12:23:14 +10:00
leo chung
4fdf795efa esp32/mpthreadport: Fix TCB cleanup function so thread_mutex is ready.
Because vPortCleanUpTCB is called by the FreeRTOS idle task, and it checks
thread, but didn't check the thread_mutex.

And if thread is not NULL, but thread_mutex not ready then it will crash
with an error when calling mp_thread_mutex_lock(&thread_mutex, 1).

As suggested by @dpgeorge, move the thread = &thread_entry0 line to the end
of mp_thread_init().

Signed-off-by: leo chung <gewalalb@gmail.com>
2021-09-24 12:12:03 +10:00
Seon Rozenblum
a39a596b79 esp32/machine_pin: Block out IO16 and IO17 when using SPIRAM on ESP32.
Fixes issue #7819.
2021-09-24 10:56:09 +10:00
Seon Rozenblum
35fb90bd57 esp32/usb: Add USB host connection detection for CDC serial output.
This callback allows detecting if there is a USB host connected to the CDC
or not, in which case the stdout_tx should skip CDC TX writing and
flushing or the system will block.

Fixes issue #7820.
2021-09-22 00:42:20 +10:00
Seon Rozenblum
7bf466a281 esp32/README: Updated readme with req IDF vers for ESP32-S2, C3 and S3. 2021-09-22 00:41:02 +10:00
IhorNehrutsa
71111cffba docs/esp32: Explain ESP32 PWM modes, timers, and channels. 2021-09-21 23:28:16 +10:00
IhorNehrutsa
52636fa692 esp32/machine_pwm: Add support for all PWM timers and channels.
This commit allows using all the available PWM timers (up to 8) and
channels (up to 16), without affecting the PWM API.

If a new frequency is set, first it checks if another timer is using the
same frequency.  If yes, then it uses this timer, otherwise, it creates a
new one.  If all timers are used, the user should set an already used
frequency, or de-init a channel.

This work is based on #6276 and #3608.
2021-09-21 23:18:09 +10:00
Stewart Bonnick
0d9429f44c esp32/boards: Add LOLIN_S2_MINI ESP32-S2 board.
To support Lolin S2 Mini ESP32-S2 Variant board.  More information about
this board can be found at https://www.wemos.cc/en/latest/s2/s2_mini.html
2021-09-21 22:49:51 +10:00
iabdalkader
67d1dca9c2 stm32/machine_i2c: Use hardware I2C for STM32H7. 2021-09-21 18:13:28 +10:00
roland van straten
9eff4029ab stm32/boards: Add PF11-BOOT0 to stm32f091_af.csv.
PF11 is added so it can be used as GPIO.
2021-09-21 18:11:42 +10:00
Ned Konz
99bb52047c stm32/boards/NUCLEO_H743ZI: Enable VfsLfs2 on NUCLEO_H743ZI(2) boards. 2021-09-21 18:02:19 +10:00
Ned Konz
8c214ed200 stm32: Extended flash filesystem space to 512K on H743 boards.
The H743 has equal sized pages of 128k, which means the filesystem doesn't
need to be near the beginning.  This commit moves the filesystem to the
very end of flash, and extends it to 512k (4 pages).

Signed-off-by: Damien George <damien@micropython.org>
2021-09-21 18:02:14 +10:00
iabdalkader
782d5b2e53 stm32: Enable platform module.
The HAL version is based on the stm32lib version.
2021-09-19 23:35:44 +10:00
iabdalkader
2c5e9bbdfa extmod: Add platform module.
It contains the compiler version, and underlying system HAL/SDK version.
2021-09-19 23:35:10 +10:00
iabdalkader
38f8e852e0 rp2: Add framework for networking.
MICROPY_PY_NETWORK and MICROPY_PY_USOCKET need to be enabled by a board to
get networking.  No NICs have yet been defined.
2021-09-19 23:20:13 +10:00
iabdalkader
c973cfd2f3 rp2: Add support for bluetooth module using NimBLE. 2021-09-19 23:09:59 +10:00
iabdalkader
8064c3bf9c extmod/nimble: Add nimble CMake fragment file. 2021-09-19 23:02:16 +10:00
iabdalkader
80f2c794e6 extmod/mpbthci.h: Add mp_bluetooth_hci_uart_any prototype.
This allows drivers to use mpbthciport functions to read/write/poll UART.
2021-09-19 23:00:39 +10:00
Chris Fiege
6e39f2cc1e stm32/boards: Add OLIMEX H407 board definition.
This change adds the OLIMEX H407 support to the STM32 port.  The H407
(https://www.olimex.com/Products/ARM/ST/STM32-H407/) is simliar to the
already existing E407
(https://www.olimex.com/Products/ARM/ST/STM32-E407) but does not support
Ethernet and has a full-size USB-A port instead of a Mini-USB socket.

Both boards use the STM32F407ZGT6 CPU.

This port is basically a copy of the E407 but with changed pinmux:
* Removed Ethernet pin definition
* Removed UART1 (pins are used for other functions)
* Removed UART3 flow control pins (pins are used for other functions)
* Removed SD-Card detect pin (since it is not connected on the H407)

A REPL on UART3 is connected to the U3BOOT-header, a 3-pin header with RX,
TX and GND that is intended for the serial terminal.

Tested:
* Micro-SD Card is detected when inserted on RESET
* REPL on UART3 works
* Serial port on the mini USB socket

Signed-off-by: Chris Fiege <cfi@pengutronix.de>
2021-09-19 16:58:58 +10:00
patrick
4cfd85eb4a esp32/boards: Add board definition for ESP32-S2-WROVER module. 2021-09-19 16:49:35 +10:00
Seon Rozenblum
13e6e0d7f5 esp32/machine_hw_spi: Fix hardware SPI DMA channels for S2/S3. 2021-09-19 10:23:12 +10:00
Damien George
da4593f937 tools/ci.sh: Use IDF v4.4 as part of esp32 CI and build GENERIC_S3.
IDF v4.4 does not have an official release so for now use the latest
master.  Also remove building GENERIC with no options (all the other boards
are no-option builds), to keep CI time reasonable.

Signed-off-by: Damien George <damien@micropython.org>
2021-09-16 22:59:05 +10:00
Damien George
80fe25689f esp32/boards: Add new GENERIC_S3 board definition.
Thanks to Seon Rozenblum aka @UnexpectedMaker for the work.

Signed-off-by: Damien George <damien@micropython.org>
2021-09-16 22:58:47 +10:00
Damien George
54d33b266c esp32: Add support for ESP32-S3 SoCs.
Thanks to Seon Rozenblum aka @UnexpectedMaker for the work.

Signed-off-by: Damien George <damien@micropython.org>
2021-09-16 22:58:47 +10:00
Jim Mussared
b326edf68c all: Remove MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE.
This commit removes all parts of code associated with the existing
MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE optimisation option, including the
-mcache-lookup-bc option to mpy-cross.

This feature originally provided a significant performance boost for Unix,
but wasn't able to be enabled for MCU targets (due to frozen bytecode), and
added significant extra complexity to generating and distributing .mpy
files.

The equivalent performance gain is now provided by the combination of
MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE (which has
been enabled on the unix port in the previous commit).

It's hard to provide precise performance numbers, but tests have been run
on a wide variety of architectures (x86-64, ARM Cortex, Aarch64, RISC-V,
xtensa) and they all generally agree on the qualitative improvements seen
by the combination of MICROPY_OPT_LOAD_ATTR_FAST_PATH and
MICROPY_OPT_MAP_LOOKUP_CACHE.

For example, on a "quiet" Linux x64 environment (i3-5010U @ 2.10GHz) the
change from CACHE_MAP_LOOKUP_IN_BYTECODE, to LOAD_ATTR_FAST_PATH combined
with MAP_LOOKUP_CACHE is:

diff of scores (higher is better)
N=2000 M=2000       bccache -> attrmapcache      diff      diff% (error%)
bm_chaos.py        13742.56 ->   13905.67 :   +163.11 =  +1.187% (+/-3.75%)
bm_fannkuch.py        60.13 ->      61.34 :     +1.21 =  +2.012% (+/-2.11%)
bm_fft.py         113083.20 ->  114793.68 :  +1710.48 =  +1.513% (+/-1.57%)
bm_float.py       256552.80 ->  243908.29 : -12644.51 =  -4.929% (+/-1.90%)
bm_hexiom.py         521.93 ->     625.41 :   +103.48 = +19.826% (+/-0.40%)
bm_nqueens.py     197544.25 ->  217713.12 : +20168.87 = +10.210% (+/-3.01%)
bm_pidigits.py      8072.98 ->    8198.75 :   +125.77 =  +1.558% (+/-3.22%)
misc_aes.py        17283.45 ->   16480.52 :   -802.93 =  -4.646% (+/-0.82%)
misc_mandel.py     99083.99 ->  128939.84 : +29855.85 = +30.132% (+/-5.88%)
misc_pystone.py    83860.10 ->   82592.56 :  -1267.54 =  -1.511% (+/-2.27%)
misc_raytrace.py   21490.40 ->   22227.23 :   +736.83 =  +3.429% (+/-1.88%)

This shows that the new optimisations are at least as good as the existing
inline-bytecode-caching, and are sometimes much better (because the new
ones apply caching to a wider variety of map lookups).

The new optimisations can also benefit code generated by the native
emitter, because they apply to the runtime rather than the generated code.
The improvement for the native emitter when LOAD_ATTR_FAST_PATH and
MAP_LOOKUP_CACHE are enabled is (same Linux environment as above):

diff of scores (higher is better)
N=2000 M=2000        native -> nat-attrmapcache  diff      diff% (error%)
bm_chaos.py        14130.62 ->   15464.68 :  +1334.06 =  +9.441% (+/-7.11%)
bm_fannkuch.py        74.96 ->      76.16 :     +1.20 =  +1.601% (+/-1.80%)
bm_fft.py         166682.99 ->  168221.86 :  +1538.87 =  +0.923% (+/-4.20%)
bm_float.py       233415.23 ->  265524.90 : +32109.67 = +13.756% (+/-2.57%)
bm_hexiom.py         628.59 ->     734.17 :   +105.58 = +16.796% (+/-1.39%)
bm_nqueens.py     225418.44 ->  232926.45 :  +7508.01 =  +3.331% (+/-3.10%)
bm_pidigits.py      6322.00 ->    6379.52 :    +57.52 =  +0.910% (+/-5.62%)
misc_aes.py        20670.10 ->   27223.18 :  +6553.08 = +31.703% (+/-1.56%)
misc_mandel.py    138221.11 ->  152014.01 : +13792.90 =  +9.979% (+/-2.46%)
misc_pystone.py    85032.14 ->  105681.44 : +20649.30 = +24.284% (+/-2.25%)
misc_raytrace.py   19800.01 ->   23350.73 :  +3550.72 = +17.933% (+/-2.79%)

In summary, compared to MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE, the new
MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE options:
- are simpler;
- take less code size;
- are faster (generally);
- work with code generated by the native emitter;
- can be used on embedded targets with a small and constant RAM overhead;
- allow the same .mpy bytecode to run on all targets.

See #7680 for further discussion.  And see also #7653 for a discussion
about simplifying mpy-cross options.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2021-09-16 16:04:03 +10:00
Jim Mussared
60c6d5594f unix: Enable LOAD_ATTR fast path, and map lookup caching.
Enabled for all variants except minimal.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2021-09-16 16:02:19 +10:00
Jim Mussared
68219a295c stm32: Enable LOAD_ATTR fast path, and map lookup caching on >M0.
Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2021-09-16 16:02:19 +10:00
Jim Mussared
11ef8f22fe py/map: Add an optional cache of (map+index) to speed up map lookups.
The existing inline bytecode caching optimisation, selected by
MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE, reserves an extra byte in the
bytecode after certain opcodes, which at runtime stores a map index of the
likely location of this field when looking up the qstr.  This scheme is
incompatible with bytecode-in-ROM, and doesn't work with native generated
code.  It also stores bytecode in .mpy files which is of a different format
to when the feature is disabled, making generation of .mpy files more
complex.

This commit provides an alternative optimisation via an approach that adds
a global cache for map offsets, then all mp_map_lookup operations use it.
It's less precise than bytecode caching, but allows the cache to be
independent and external to the bytecode that is executing.  It also works
for the native emitter and adds a similar performance boost on top of the
gain already provided by the native emitter.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2021-09-16 16:02:19 +10:00
Jim Mussared
7b89ad8dbf py/vm: Add a fast path for LOAD_ATTR on instance types.
When the LOAD_ATTR opcode is executed there are quite a few different cases
that have to be handled, but the common case is accessing a member on an
instance type.  Typically, built-in types provide methods which is why this
is common.

Fortunately, for this specific case, if the member is found in the member
map then there's no further processing.

This optimisation does a relatively cheap check (type is instance) and then
forwards directly to the member map lookup, falling back to the regular
path if necessary.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2021-09-16 16:02:15 +10:00
Jim Mussared
910e060f93 minimal/mpconfigport.h: Use MICROPY_CONFIG_ROM_LEVEL_MINIMUM.
Update minimal port to use the new "minimal" rom level config (this is a
no-op change, the binary is the same size and contains the exact same
symbols).

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2021-09-16 13:24:33 +10:00
Jim Mussared
01374d941f py/mpconfig.h: Define initial templates for "feature levels".
This is the beginning of a set of changes to simplify enabling/disabling
features.  The goals are:
- Remove redundancy from mpconfigport.h (never set a value to the default
  -- make it clear exactly what's being enabled).
- Improve consistency between ports.  All "similar" ports (i.e. approx same
  flash size) should get the same features.
- Simplify mpconfigport.h -- just get default/sensible options for the size
  of the port.
- Make it easy for defining constrained boards (e.g. STM32F0/L0), they can
  just set a lower level.

This commit makes a step towards this and defines the "core" level as the
current default feature set, and a "minimal" level to turn off everything.
And a few placeholder levels are added for where the other ports will
roughly land.

This is a no-op change for all ports.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2021-09-16 13:19:11 +10:00
Damien George
0c0807e084 stm32/dma: Add functions for external users of DMA to enable clock.
Any external user of DMA (eg a board with a custom DMA driver) must call
dma_external_acquire() for their DMA controller/stream to ensure that the
DMA clock is not automatically turned off while it's still being used
externally.

Signed-off-by: Damien George <damien@micropython.org>
2021-09-16 13:01:43 +10:00
Damien George
c51cc46bf8 stm32/boards/make-pins.py: Allow empty lines and comments in pins.csv.
Signed-off-by: Damien George <damien@micropython.org>
2021-09-16 12:53:16 +10:00
Damien George
a6907c779a stm32/boards/make-pins.py: Allow a CPU pin to be hidden.
This change allows a CPU pin to be hidden from the user by prefixing it
with a "-" in the pins.csv file for a board.  It will still be available in
C code, just not exposed to Python.

Signed-off-by: Damien George <damien@micropython.org>
2021-09-16 12:53:16 +10:00
Jim Mussared
e3eebc329f stm32: Suggest putting code in main.py not boot.py.
Don't want users to accidentally use boot.py (because recovering requires
knowing how to activate safe mode).

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2021-09-16 12:40:05 +10:00
Jan Staal
9e2423e730 stm32: Add support for H7A3(Q)/H7B3(Q), and STM32H73B3I_DK board defn.
This commit is based upon prior work of @dpgeorge and @koendv.

MCU support for the STM32H7A3 and B3 families MCUs:
- STM32H7A3xx
- STM32H7A3xxQ (SMPS)
- STM32H7B3xx
- STM32H7B3xxQ (SMPS)

Support has been added for the STM32H7B3I_DK board.

Signed-off-by: Jan Staal <info@janstaal.com>
2021-09-16 12:29:28 +10:00
iabdalkader
d9749f90ad extmod/modnetwork: Remove modnetwork socket u_state member.
To simplify the socket state.

The CC3K driver (see drivers/cc3000/inc/socket.h and src/socket.c) has
socket() returning an INT16 so there is now enough room to store it
directly in the fileno member.
2021-09-15 11:29:02 +10:00
iabdalkader
f9d573a4ac extmod/modnetwork: Remove STM32 references. 2021-09-15 11:27:38 +10:00