Jeff Epler 40ab5c6b21 compression: Implement ciscorn's dictionary approach
Massive savings.  Thanks so much @ciscorn for providing the initial
code for choosing the dictionary.

This adds a bit of time to the build, both to find the dictionary
but also because (for reasons I don't fully understand), the binary
search in the compress() function no longer worked and had to be
replaced with a linear search.

I think this is because the intended invariant is that for codebook
entries that encode to the same number of bits, the entries are ordered
in ascending value.  However, I mis-placed the transition from "words"
to "byte/char values" so the codebook entries for words are in word-order
rather than their code order.

Because this price is only paid at build time, I didn't care to determine
exactly where the correct fix was.

I also commented out a line to produce the "estimated total memory size"
-- at least on the unix build with TRANSLATION=ja, this led to a build
time KeyError trying to compute the codebook size for all the strings.
I think this occurs because some single unicode code point ('ァ') is
no longer present as itself in the compressed strings, due to always
being replaced by a word.

As promised, this seems to save hundreds of bytes in the German translation
on the trinket m0.

Testing performed:
 - built trinket_m0 in several languages
 - built and ran unix port in several languages (en, de_DE, ja) and ran
   simple error-producing codes like ./micropython -c '1/0'
2020-09-12 10:10:45 -05:00
..
2018-08-07 14:58:57 -07:00
2020-07-06 19:16:25 +01:00
2020-07-06 19:16:25 +01:00
2020-01-28 17:11:25 -05:00