This switches stage2 to C and uses Jinja to change the C code based
on flash settings from https://github.com/adafruit/nvm.toml. It
produces the fastest settings for the given set of external flashes.
Flash size is no longer hard coded so switching flashes with similar
capabilities but different sizes should *just work*.
This PR also places "ITCM" code in RAM to save the XIP cache for
code execution. Further optimization is possible. A blink code.py
still requires a number of flash fetches every blink.
Fixes#4041