Jim Mussared b326edf68c all: Remove MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE.
This commit removes all parts of code associated with the existing
MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE optimisation option, including the
-mcache-lookup-bc option to mpy-cross.

This feature originally provided a significant performance boost for Unix,
but wasn't able to be enabled for MCU targets (due to frozen bytecode), and
added significant extra complexity to generating and distributing .mpy
files.

The equivalent performance gain is now provided by the combination of
MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE (which has
been enabled on the unix port in the previous commit).

It's hard to provide precise performance numbers, but tests have been run
on a wide variety of architectures (x86-64, ARM Cortex, Aarch64, RISC-V,
xtensa) and they all generally agree on the qualitative improvements seen
by the combination of MICROPY_OPT_LOAD_ATTR_FAST_PATH and
MICROPY_OPT_MAP_LOOKUP_CACHE.

For example, on a "quiet" Linux x64 environment (i3-5010U @ 2.10GHz) the
change from CACHE_MAP_LOOKUP_IN_BYTECODE, to LOAD_ATTR_FAST_PATH combined
with MAP_LOOKUP_CACHE is:

diff of scores (higher is better)
N=2000 M=2000       bccache -> attrmapcache      diff      diff% (error%)
bm_chaos.py        13742.56 ->   13905.67 :   +163.11 =  +1.187% (+/-3.75%)
bm_fannkuch.py        60.13 ->      61.34 :     +1.21 =  +2.012% (+/-2.11%)
bm_fft.py         113083.20 ->  114793.68 :  +1710.48 =  +1.513% (+/-1.57%)
bm_float.py       256552.80 ->  243908.29 : -12644.51 =  -4.929% (+/-1.90%)
bm_hexiom.py         521.93 ->     625.41 :   +103.48 = +19.826% (+/-0.40%)
bm_nqueens.py     197544.25 ->  217713.12 : +20168.87 = +10.210% (+/-3.01%)
bm_pidigits.py      8072.98 ->    8198.75 :   +125.77 =  +1.558% (+/-3.22%)
misc_aes.py        17283.45 ->   16480.52 :   -802.93 =  -4.646% (+/-0.82%)
misc_mandel.py     99083.99 ->  128939.84 : +29855.85 = +30.132% (+/-5.88%)
misc_pystone.py    83860.10 ->   82592.56 :  -1267.54 =  -1.511% (+/-2.27%)
misc_raytrace.py   21490.40 ->   22227.23 :   +736.83 =  +3.429% (+/-1.88%)

This shows that the new optimisations are at least as good as the existing
inline-bytecode-caching, and are sometimes much better (because the new
ones apply caching to a wider variety of map lookups).

The new optimisations can also benefit code generated by the native
emitter, because they apply to the runtime rather than the generated code.
The improvement for the native emitter when LOAD_ATTR_FAST_PATH and
MAP_LOOKUP_CACHE are enabled is (same Linux environment as above):

diff of scores (higher is better)
N=2000 M=2000        native -> nat-attrmapcache  diff      diff% (error%)
bm_chaos.py        14130.62 ->   15464.68 :  +1334.06 =  +9.441% (+/-7.11%)
bm_fannkuch.py        74.96 ->      76.16 :     +1.20 =  +1.601% (+/-1.80%)
bm_fft.py         166682.99 ->  168221.86 :  +1538.87 =  +0.923% (+/-4.20%)
bm_float.py       233415.23 ->  265524.90 : +32109.67 = +13.756% (+/-2.57%)
bm_hexiom.py         628.59 ->     734.17 :   +105.58 = +16.796% (+/-1.39%)
bm_nqueens.py     225418.44 ->  232926.45 :  +7508.01 =  +3.331% (+/-3.10%)
bm_pidigits.py      6322.00 ->    6379.52 :    +57.52 =  +0.910% (+/-5.62%)
misc_aes.py        20670.10 ->   27223.18 :  +6553.08 = +31.703% (+/-1.56%)
misc_mandel.py    138221.11 ->  152014.01 : +13792.90 =  +9.979% (+/-2.46%)
misc_pystone.py    85032.14 ->  105681.44 : +20649.30 = +24.284% (+/-2.25%)
misc_raytrace.py   19800.01 ->   23350.73 :  +3550.72 = +17.933% (+/-2.79%)

In summary, compared to MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE, the new
MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE options:
- are simpler;
- take less code size;
- are faster (generally);
- work with code generated by the native emitter;
- can be used on embedded targets with a small and constant RAM overhead;
- allow the same .mpy bytecode to run on all targets.

See #7680 for further discussion.  And see also #7653 for a discussion
about simplifying mpy-cross options.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2021-09-16 16:04:03 +10:00
..

This is the experimental, community-supported Windows port of MicroPython. It is based on Unix port, and expected to remain so. The port requires additional testing, debugging, and patches. Please consider to contribute.

All gcc-based builds use the gcc compiler from Mingw-w64, which is the advancement of the original mingw project. The latter is getting obsolete and is not actively supported by MicroPython.

Build instruction assume you're in the ports/windows directory.

Building on Debian/Ubuntu Linux system

sudo apt-get install python3 build-essential gcc-mingw-w64
make -C ../../mpy-cross
make CROSS_COMPILE=i686-w64-mingw32-

Building under Cygwin

Install Cygwin, then install following packages using Cygwin's setup.exe:

  • mingw64-i686-gcc-core
  • mingw64-x86_64-gcc-core
  • make
  • python3

Build using:

make -C ../../mpy-cross CROSS_COMPILE=i686-w64-mingw32-
make CROSS_COMPILE=i686-w64-mingw32-

Or for 64bit:

make -C ../../mpy-cross CROSS_COMPILE=x86_64-w64-mingw32-
make CROSS_COMPILE=x86_64-w64-mingw32-

Building under MSYS2

Install MSYS2 from http://repo.msys2.org/distrib, start the msys2.exe shell and install the build tools:

pacman -Syuu
pacman -S make mingw-w64-x86_64-gcc pkg-config python3

Start the mingw64.exe shell and build:

make -C ../../mpy-cross STRIP=echo SIZE=echo
make

Building using MS Visual Studio 2013 (or higher)

Install Python. There are several ways to do this, for example: download and install the latest Python 3 release from https://www.python.org/downloads/windows or from https://docs.conda.io/en/latest/miniconda.html, or open the Microsoft Store app and search for Python and install it.

Install Visual Studio and the C++ toolset (for recent versions: install the free Visual Studio Community edition and the Desktop development with C++ workload).

In the IDE, open micropython-cross.vcxproj and micropython.vcxproj and build.

To build from the command line:

msbuild ../../mpy-cross/mpy-cross.vcxproj
msbuild micropython.vcxproj

Stack usage

The msvc compiler is quite stack-hungry which might result in a "maximum recursion depth exceeded" RuntimeError for code with lots of nested function calls. There are several ways to deal with this:

  • increase the threshold used for detection by altering the argument to mp_stack_set_limit in ports/unix/main.c
  • disable detection all together by setting MICROPY_STACK_CHECK to "0" in ports/windows/mpconfigport.h
  • disable the /GL compiler flag by setting WholeProgramOptimization to "false"

See issue 2927 for more information.

Running the tests

This is similar for all ports:

cd ../../tests
python ./run-tests.py

Though when running on Cygwin and using Cygwin's Python installation you'll need:

python3 ./run-tests.py

Depending on the combination of platform and Python version used it might be needed to first set the MICROPY_MICROPYTHON environment variable to the full path of micropython.exe.

Running on Linux using Wine

The default build (MICROPY_USE_READLINE=1) uses extended Windows console functions and thus should be ran using the wineconsole tool. Depending on the Wine build configuration, you may also want to select the curses backend which has the look&feel of a standard Unix console:

wineconsole --backend=curses ./micropython.exe

For more info, see https://www.winehq.org/docs/wineusr-guide/cui-programs .

If built without line editing and history capabilities (MICROPY_USE_READLINE=0), the resulting binary can be run using the standard wine tool.