This switches stage2 to C and uses Jinja to change the C code based
on flash settings from https://github.com/adafruit/nvm.toml. It
produces the fastest settings for the given set of external flashes.
Flash size is no longer hard coded so switching flashes with similar
capabilities but different sizes should *just work*.
This PR also places "ITCM" code in RAM to save the XIP cache for
code execution. Further optimization is possible. A blink code.py
still requires a number of flash fetches every blink.
Fixes#4041
Introduces a way to place CircuitPython code and data into
tightly coupled memory (TCM) which is accessible by the CPU in a
single cycle. It also frees up room in the corresponding cache for
intermittent data. Loading from external flash is slow!
The data cache is also now enabled.
Adds support for the iMX RT 1021 chip. Adds three new boards:
* iMX RT 1020 EVK
* iMX RT 1060 EVK
* Teensy 4.0
Related to #2492, #2472 and #2477. Fixes#2475.