esp32: Enable optimisations and move code to iRAM to boost performance.

This commit enables some significant optimisations for esp32:
- move the VM to iRAM
- move hot parts of the runtime to iRAM (map lookup, load global/name,
  mp_obj_get_type)
- enable MICROPY_OPT_LOAD_ATTR_FAST_PATH
- enable MICROPY_OPT_MAP_LOOKUP_CACHE
- disable assertions
- change from -Os to -O2 for compilation

It's hard to measure performance on esp32 due to external flash and
hardware caching.  But this set of changes improves performance compared to
master by (on a TinyPICO with the GENERIC build, using IDF 4.2.2, running
at 160MHz):

diff of scores (higher is better)
N=100 M=100    esp32-master -> esp32-perf       diff      diff% (error%)
bm_chaos.py           71.28 ->     268.08 :  +196.80 = +276.094% (+/-0.04%)
bm_fannkuch.py        44.10 ->      69.31 :   +25.21 = +57.166% (+/-0.01%)
bm_fft.py           1385.27 ->    2538.23 : +1152.96 = +83.230% (+/-0.01%)
bm_float.py         1060.94 ->    3900.62 : +2839.68 = +267.657% (+/-0.03%)
bm_hexiom.py          10.90 ->      32.79 :   +21.89 = +200.826% (+/-0.02%)
bm_nqueens.py       1000.83 ->    2372.87 : +1372.04 = +137.090% (+/-0.01%)
bm_pidigits.py       288.13 ->     664.40 :  +376.27 = +130.590% (+/-0.46%)
misc_aes.py          102.45 ->     345.69 :  +243.24 = +237.423% (+/-0.01%)
misc_mandel.py      1016.58 ->    2121.92 : +1105.34 = +108.731% (+/-0.01%)
misc_pystone.py      632.91 ->    1801.87 : +1168.96 = +184.696% (+/-0.08%)
misc_raytrace.py      76.66 ->     281.78 :  +205.12 = +267.571% (+/-0.05%)
viper_call0.py       210.63 ->     273.17 :   +62.54 = +29.692% (+/-0.01%)
viper_call1a.py      208.45 ->     269.51 :   +61.06 = +29.292% (+/-0.00%)
viper_call1b.py      185.44 ->     228.25 :   +42.81 = +23.086% (+/-0.01%)
viper_call1c.py      185.86 ->     228.90 :   +43.04 = +23.157% (+/-0.01%)
viper_call2a.py      207.10 ->     267.25 :   +60.15 = +29.044% (+/-0.00%)
viper_call2b.py      173.76 ->     209.42 :   +35.66 = +20.523% (+/-0.00%)

Five tests have more than 3x speed up (200%+).

The performance of the tests bm_fft, bm_pidigits and misc_aes now scale
with CPU frequency (eg changing frequency to 240MHz boosts the performance
of these by 50%), which means they are no longer influenced by timing of
external flash access.  (The viper_call* tests did previously scale with
CPU frequency, and they still do.)

Turning off assertions reduces code size by about 80k, and going from -Os
to -O2 costs about 100k, so the net change in code size (for the GENERIC
board) is about +20k.

If a board wants to enable assertions, or use -Os instead of -O2, that's
still possible by overriding the sdkconfig parameters.

Signed-off-by: Damien George <damien@micropython.org>
This commit is contained in:
Damien George 2021-09-24 12:50:59 +10:00
parent 8412568e7b
commit 549448e8bb
2 changed files with 11 additions and 3 deletions

View File

@ -3,11 +3,11 @@
CONFIG_IDF_FIRMWARE_CHIP_ID=0x0000
# Compiler options: use -Os to reduce size, but keep full assertions
# Compiler options: use -O2 and disable assertions to improve performance
# (CONFIG_COMPILER_OPTIMIZATION_LEVEL_RELEASE is for IDF 4.0.2)
CONFIG_COMPILER_OPTIMIZATION_LEVEL_RELEASE=y
CONFIG_COMPILER_OPTIMIZATION_SIZE=y
CONFIG_COMPILER_OPTIMIZATION_ASSERTIONS_ENABLE=y
CONFIG_COMPILER_OPTIMIZATION_PERF=y
CONFIG_COMPILER_OPTIMIZATION_ASSERTIONS_DISABLE=y
# Application manager
CONFIG_APP_EXCLUDE_PROJECT_VER_VAR=y

View File

@ -30,6 +30,8 @@
// optimisations
#define MICROPY_OPT_COMPUTED_GOTO (1)
#define MICROPY_OPT_LOAD_ATTR_FAST_PATH (1)
#define MICROPY_OPT_MAP_LOOKUP_CACHE (1)
#define MICROPY_OPT_MPZ_BITWISE (1)
// Python internal features
@ -289,6 +291,12 @@ void *esp_native_code_commit(void *, size_t, void *);
#endif
// Functions that should go in IRAM
#define MICROPY_WRAP_MP_BINARY_OP(f) IRAM_ATTR f
#define MICROPY_WRAP_MP_EXECUTE_BYTECODE(f) IRAM_ATTR f
#define MICROPY_WRAP_MP_LOAD_GLOBAL(f) IRAM_ATTR f
#define MICROPY_WRAP_MP_LOAD_NAME(f) IRAM_ATTR f
#define MICROPY_WRAP_MP_MAP_LOOKUP(f) IRAM_ATTR f
#define MICROPY_WRAP_MP_OBJ_GET_TYPE(f) IRAM_ATTR f
#define MICROPY_WRAP_MP_SCHED_EXCEPTION(f) IRAM_ATTR f
#define MICROPY_WRAP_MP_SCHED_KEYBOARD_INTERRUPT(f) IRAM_ATTR f