Jeff Epler 40ab5c6b21 compression: Implement ciscorn's dictionary approach
Massive savings.  Thanks so much @ciscorn for providing the initial
code for choosing the dictionary.

This adds a bit of time to the build, both to find the dictionary
but also because (for reasons I don't fully understand), the binary
search in the compress() function no longer worked and had to be
replaced with a linear search.

I think this is because the intended invariant is that for codebook
entries that encode to the same number of bits, the entries are ordered
in ascending value.  However, I mis-placed the transition from "words"
to "byte/char values" so the codebook entries for words are in word-order
rather than their code order.

Because this price is only paid at build time, I didn't care to determine
exactly where the correct fix was.

I also commented out a line to produce the "estimated total memory size"
-- at least on the unix build with TRANSLATION=ja, this led to a build
time KeyError trying to compute the codebook size for all the strings.
I think this occurs because some single unicode code point ('ァ') is
no longer present as itself in the compressed strings, due to always
being replaced by a word.

As promised, this seems to save hundreds of bytes in the German translation
on the trinket m0.

Testing performed:
 - built trinket_m0 in several languages
 - built and ran unix port in several languages (en, de_DE, ja) and ran
   simple error-producing codes like ./micropython -c '1/0'
2020-09-12 10:10:45 -05:00
..
2019-10-21 18:57:03 -07:00
2020-07-12 19:45:23 -04:00
2020-08-02 11:36:38 -04:00
2020-01-23 20:16:31 -05:00
2020-08-17 17:17:59 -07:00
2020-05-15 15:36:16 -07:00
2020-08-24 18:29:50 -07:00
2019-08-27 15:21:47 -07:00
2020-08-17 17:17:59 -07:00
2020-05-15 15:36:16 -07:00
2020-05-15 15:36:16 -07:00
2020-08-30 14:39:03 -04:00