Massive savings. Thanks so much @ciscorn for providing the initial
code for choosing the dictionary.
This adds a bit of time to the build, both to find the dictionary
but also because (for reasons I don't fully understand), the binary
search in the compress() function no longer worked and had to be
replaced with a linear search.
I think this is because the intended invariant is that for codebook
entries that encode to the same number of bits, the entries are ordered
in ascending value. However, I mis-placed the transition from "words"
to "byte/char values" so the codebook entries for words are in word-order
rather than their code order.
Because this price is only paid at build time, I didn't care to determine
exactly where the correct fix was.
I also commented out a line to produce the "estimated total memory size"
-- at least on the unix build with TRANSLATION=ja, this led to a build
time KeyError trying to compute the codebook size for all the strings.
I think this occurs because some single unicode code point ('ァ') is
no longer present as itself in the compressed strings, due to always
being replaced by a word.
As promised, this seems to save hundreds of bytes in the German translation
on the trinket m0.
Testing performed:
- built trinket_m0 in several languages
- built and ran unix port in several languages (en, de_DE, ja) and ran
simple error-producing codes like ./micropython -c '1/0'
Limor confirmed that the all shipping revisions starting with Rev D had QSPI flash chips installed.
Note that when neither EXTERNAL_FLASH_QSPI_SINGLE nor EXTERNAL_FLASH_QSPI_DUAL is specified quad mode is assumed, so this is addressed by removing the setting altogether.
This fixes SPI with PSRAM allocated buffers. DMA with SPI2 was
attempted but produced junk output. This manual copy is less than
2x slower than DMA when not interrupted.
Fixes#3339
Two problems: The lead byte for 3-byte sequences was wrong, and one
mid-byte was not even filled in due to a missing "++"!
Apparently this was broken ever since the first "Compress as unicode,
not bytes" commit, but I believed I'd "tested" it by running on the
Pinyin translation.
This rendered at least the Korean and Japanese translations completely
illegible, affecting 5.0 and all later releases.