This commit removes all parts of code associated with the existing
MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE optimisation option, including the
-mcache-lookup-bc option to mpy-cross.
This feature originally provided a significant performance boost for Unix,
but wasn't able to be enabled for MCU targets (due to frozen bytecode), and
added significant extra complexity to generating and distributing .mpy
files.
The equivalent performance gain is now provided by the combination of
MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE (which has
been enabled on the unix port in the previous commit).
It's hard to provide precise performance numbers, but tests have been run
on a wide variety of architectures (x86-64, ARM Cortex, Aarch64, RISC-V,
xtensa) and they all generally agree on the qualitative improvements seen
by the combination of MICROPY_OPT_LOAD_ATTR_FAST_PATH and
MICROPY_OPT_MAP_LOOKUP_CACHE.
For example, on a "quiet" Linux x64 environment (i3-5010U @ 2.10GHz) the
change from CACHE_MAP_LOOKUP_IN_BYTECODE, to LOAD_ATTR_FAST_PATH combined
with MAP_LOOKUP_CACHE is:
diff of scores (higher is better)
N=2000 M=2000 bccache -> attrmapcache diff diff% (error%)
bm_chaos.py 13742.56 -> 13905.67 : +163.11 = +1.187% (+/-3.75%)
bm_fannkuch.py 60.13 -> 61.34 : +1.21 = +2.012% (+/-2.11%)
bm_fft.py 113083.20 -> 114793.68 : +1710.48 = +1.513% (+/-1.57%)
bm_float.py 256552.80 -> 243908.29 : -12644.51 = -4.929% (+/-1.90%)
bm_hexiom.py 521.93 -> 625.41 : +103.48 = +19.826% (+/-0.40%)
bm_nqueens.py 197544.25 -> 217713.12 : +20168.87 = +10.210% (+/-3.01%)
bm_pidigits.py 8072.98 -> 8198.75 : +125.77 = +1.558% (+/-3.22%)
misc_aes.py 17283.45 -> 16480.52 : -802.93 = -4.646% (+/-0.82%)
misc_mandel.py 99083.99 -> 128939.84 : +29855.85 = +30.132% (+/-5.88%)
misc_pystone.py 83860.10 -> 82592.56 : -1267.54 = -1.511% (+/-2.27%)
misc_raytrace.py 21490.40 -> 22227.23 : +736.83 = +3.429% (+/-1.88%)
This shows that the new optimisations are at least as good as the existing
inline-bytecode-caching, and are sometimes much better (because the new
ones apply caching to a wider variety of map lookups).
The new optimisations can also benefit code generated by the native
emitter, because they apply to the runtime rather than the generated code.
The improvement for the native emitter when LOAD_ATTR_FAST_PATH and
MAP_LOOKUP_CACHE are enabled is (same Linux environment as above):
diff of scores (higher is better)
N=2000 M=2000 native -> nat-attrmapcache diff diff% (error%)
bm_chaos.py 14130.62 -> 15464.68 : +1334.06 = +9.441% (+/-7.11%)
bm_fannkuch.py 74.96 -> 76.16 : +1.20 = +1.601% (+/-1.80%)
bm_fft.py 166682.99 -> 168221.86 : +1538.87 = +0.923% (+/-4.20%)
bm_float.py 233415.23 -> 265524.90 : +32109.67 = +13.756% (+/-2.57%)
bm_hexiom.py 628.59 -> 734.17 : +105.58 = +16.796% (+/-1.39%)
bm_nqueens.py 225418.44 -> 232926.45 : +7508.01 = +3.331% (+/-3.10%)
bm_pidigits.py 6322.00 -> 6379.52 : +57.52 = +0.910% (+/-5.62%)
misc_aes.py 20670.10 -> 27223.18 : +6553.08 = +31.703% (+/-1.56%)
misc_mandel.py 138221.11 -> 152014.01 : +13792.90 = +9.979% (+/-2.46%)
misc_pystone.py 85032.14 -> 105681.44 : +20649.30 = +24.284% (+/-2.25%)
misc_raytrace.py 19800.01 -> 23350.73 : +3550.72 = +17.933% (+/-2.79%)
In summary, compared to MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE, the new
MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE options:
- are simpler;
- take less code size;
- are faster (generally);
- work with code generated by the native emitter;
- can be used on embedded targets with a small and constant RAM overhead;
- allow the same .mpy bytecode to run on all targets.
See #7680 for further discussion. And see also #7653 for a discussion
about simplifying mpy-cross options.
Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
The zephyr port doesn't support SoftI2C so it's not enabled, and the legacy
I2C constructor check can be removed.
Signed-off-by: Damien George <damien@micropython.org>
This adds support for making static (ie not on the Python GC heap) soft
timers. This can be useful for a board to define a custom background
handler, or eventually for BLE/network processing to use instead of systick
slots; it will be more efficient using soft timer for this.
The main issue with using the existing code for static soft timers is that
it would combine heap allocated and statically allocated soft_timer_entry_t
instances in the same pairing-heap data structure. This would prevent the
GC from tracing some of the heap allocated entries (because the GC won't
follow pointers outside the heap).
This commit makes it so that soft timer entries are explicitly marked,
instead of relying on implicit marking by having the root of the pairing
heap in the root pointer section. Also, on soft reset only the heap-
allocated soft timers are deleted from the pairing heap, leaving the
statically allocated ones.
Signed-off-by: Damien George <damien@micropython.org>
These ports already have uzlib enabled so this additional ubinascii.crc32
function only costs about 90 bytes of flash.
Signed-off-by: Damien George <damien@micropython.org>
Add LPUART1 as a standard UART. No low power features are supported, yet.
LPUART1 is enabled as the next available UART after the standard U(S)ARTs:
STM32WB: LPUART1 = UART(2)
STM32L0: LPUART1 = UART(6)
STM32L4: LPUART1 = UART(6)
STM32H7: LPUART1 = UART(9)
On all ports: LPUART1 = machine.UART('LP1')
LPUART1 is enabled by defining MICROPY_HW_LPUART1_TX and
MICROPY_HW_LPUART1_RX in mpconfigboard.h.
Signed-off-by: Chris Mason <c.mason@inchipdesign.com.au>
The default for these is to enable them, but they can now be disabled
individually by a board configuration.
Signed-off-by: Damien George <damien@micropython.org>
To simplify config, there's no need to specify MP_PLAT_PRINT_STRN if it's
the same as the default definition in py/mpconfig.h.
Signed-off-by: Damien George <damien@micropython.org>
This changes stm32 from using PENDSV to run NimBLE to use the MicroPython
scheduler instead. This allows Python BLE callbacks to be invoked directly
(and therefore synchronously) rather than via the ringbuffer.
The NimBLE UART HCI and event processing now happens in a scheduled task
every 128ms. When RX IRQ idle events arrive, it will also schedule this
task to improve latency.
There is a similar change for the unix port where the background thread now
queues the scheduled task.
Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
The SoftSPI constructor is now used soley to create SoftSPI instances, it
can no longer delegate to create a hardware-based SPI instance.
Signed-off-by: Damien George <damien@micropython.org>
The SoftI2C constructor is now used soley to create SoftI2C instances, it
can no longer delegate to create a hardware-based I2C instance.
Signed-off-by: Damien George <damien@micropython.org>
Previously the interaction between the different layers of the Bluetooth
stack was different on each port and each stack. This commit defines
common interfaces between them and implements them for cyw43, btstack,
nimble, stm32, unix.
Previously, if FAT was not enabled but LFS1/2 was then MICROPY_PY_IO_FILEIO
would be disabled and file binary-mode was not supported.
Signed-off-by: Damien George <damien@micropython.org>
Previous behaviour is when this argument is set to "true", in which case
the function will raise any pending exception. Setting it to "false" will
cancel any pending exception.
Instances of the slice class are passed to __getitem__() on objects when
the user indexes them with a slice. In practice the majority of the time
(other than passing it on untouched) is to work out what the slice means in
the context of an array dimension of a particular length. Since Python 2.3
there has been a method on the slice class, indices(), that takes a
dimension length and returns the real start, stop and step, accounting for
missing or negative values in the slice spec. This commit implements such
a indices() method on the slice class.
It is configurable at compile-time via MICROPY_PY_BUILTINS_SLICE_INDICES,
disabled by default, enabled on unix, stm32 and esp32 ports.
This commit also adds new tests for slice indices and for slicing unicode
strings.
For the 3 ports that already make use of this feature (stm32, nrf and
teensy) this doesn't make any difference, it just allows to disable it from
now on.
For other ports that use pyexec, this decreases code size because the debug
printing code is dead (it can't be enabled) but the compiler can't deduce
that, so code is still emitted.
Most stm32 boards can now be built in nan-boxing mode via:
$ make NANBOX=1
Note that if float is enabled then it will be forced to double-precision.
Also, native emitters will be disabled.
The default protection for the BLE ringbuf is to use
MICROPY_BEGIN_ATOMIC_SECTION, which disables all interrupts. On stm32 it
only needs to disable the lowest priority IRQ, pendsv, because that's the
IRQ level at which the BLE stack is driven.
This commit removes the Makefile-level MICROPY_FATFS config and moves the
MICROPY_VFS_FAT config to the Makefile level to replace it. It also moves
the include of the oofatfs source files in the build from each port to a
central place in extmod/extmod.mk.
For a port to enabled VFS FAT support it should now set MICROPY_VFS_FAT=1
at the level of the Makefile. This will include the relevant oofatfs files
in the build and set MICROPY_VFS_FAT=1 at the C (preprocessor) level.
This commit adds an implementation of a "software timer" with a 1ms
resolution, using SysTick. It allows unlimited number of concurrent
timers (limited only by memory needed for each timer entry). They can be
one-shot or periodic, and associated with a Python callback.
There is a very small overhead added to the SysTick IRQ, which could be
further optimised in the future, eg by patching SysTick_Handler code
dynamically.
For consistency with "umachine". Now that weak links are enabled
by default for built-in modules, this should be a no-op, but allows
extension of the bluetooth module by user code.
Also move registration of ubluetooth to objmodule rather than
port-specific.
This commit implements automatic module weak links for all built-in
modules, by searching for "ufoo" in the built-in module list if "foo"
cannot be found. This means that all modules named "ufoo" are always
available as "foo". Also, a port can no longer add any other weak links,
which makes strict the definition of a weak link.
It saves some code size (about 100-200 bytes) on ports that previously had
lots of weak links.
Some changes from the previous behaviour:
- It doesn't intern the non-u module names (eg "foo" is not interned),
which saves code size, but will mean that "import foo" creates a new qstr
(namely "foo") in RAM (unless the importing module is frozen).
- help('modules') no longer lists non-u module names, only the u-variants;
this reduces duplication in the help listing.
Weak links are effectively the same as having a set of symbolic links on
the filesystem that is searched last. So an "import foo" will search
built-in modules first, then all paths in sys.path, then weak links last,
importing "ufoo" if it exists. Thus a file called "foo.py" somewhere in
sys.path will still have precedence over the weak link of "foo" to "ufoo".
See issues: #1740, #4449, #5229, #5241.
On other ports (e.g. ESP32) they provide a complete Nimble implementation
(i.e. we don't need to use the code in extmod/nimble). This change
extracts out the bits that we don't need to use in other ports:
- malloc/free/realloc for Nimble memory.
- pendsv poll handler
- depowering the cywbt
Also cleans up the root pointer management.
As per PEP 485, this function appeared in for Python 3.5. Configured via
MICROPY_PY_MATH_ISCLOSE which is disabled by default, but enabled for the
ports which already have MICROPY_PY_MATH_SPECIAL_FUNCTIONS enabled.