This also tweaks the repr for unicode strings to only escape a few
utf-8 code points. This makes emoji show in os.listdir() for
example.
Also, enable exfat support on full builds.
Fixes#5146
This leaves much more space on SAMD21 builds that aren't "full builds".
These are new APIs that we don't need to add to old boards.
Also, tweak two Arduino boards to save space on them.
* The new nonstandard '%S' format takes a pointer to compressed_string_t
and prints it
* The new mp_cprintf and mp_vcprintf take a format string that is a
compressed_string_t
By storing "count of words by length", the long `wends` table can be
replaced with a short `wlencount` table. This saves flash storage space.
Extend the range of string lengths that can be in the dictionary.
Originally it was to 2 to 9; at one point it was changed to 3 to 9.
Putting the lower bound back at 2 has a positive impact on the French
translation (a bunch of them, such as "ch", "\r\n", "%q", are used).
Increasing the maximum length gets 'mpossible', ' doit être ',
and 'CircuitPyth' at the long end. This adds a bit of processing time
to makeqstrdata. The specific 2/11 values are again empirical based on
the French translation on the adafruit_proxlight_trinkey_m0.
I was puzzled by why the dictionary words were sorted by length.
It was because TextSplitter sorted its parameter, instead of a copy.
This doesn't affect encoding size, but does affect the encoding NUMBER
of the found words. We'll deliberately restore sorting by length next,
for other reasons, but not by spooky action.
Try to accurately measure the costs of including a word in the dictionary
vs the gains from using it in messages.
This saves about 160 bytes on trinket_m0 ja, the fullest translation
for that board. Other translations on the same board all have savings,
ranging from 24 to 228 bytes.
```
Translation Before After Savings
ja 1164 1324 160
de_DE 1260 1396 136
fr 1424 1652 228
zh_Latn_pinyin 1448 1520 72
pt_BR 1584 1736 152
pl 1592 1640 48
es 1724 1816 92
ko 1724 1816 92
fil 1764 1800 36
it_IT 1896 2040 144
nl 1956 2136 180
ID 2072 2180 108
cs 2124 2148 24
sv 2340 2448 108
en_x_pirate 2644 2740 96
en_GB 2652 2752 100
el 2656 2768 112
en_US 2656 2768 112
hi 2656 2768 112
```