By treating each unicode code-point as a single entity for huffman
compression, the overall compression rate can be somewhat improved
without changing the algorithm. On the decompression side, when
compressed values above 127 are encountered, they need to be
converted from a 16-bit Unicode code point into a UTF-8 byte
sequence.
Doing this returns approximately 1.5kB of flash storage with the
zh_Latn_pinyin translation. (292 -> 1768 bytes remaining in my build
of trinket_m0)
Other "more ASCII" translations benefit less, and in fact
zh_Latn_pinyin is no longer the most constrained translation!
(de_DE 1156 -> 1384 bytes free in flash, I didn't check others
before pushing for CI)
English is slightly pessimized, 2840 -> 2788 bytes, probably mostly
because the "values" array was changed from uint8_t to uint16_t,
which is strictly not required for an all-ASCII translation. This
could probably be avoided in this case, but as English is not the
most constrained translation it doesn't really matter.
Testing performed: built for feather nRF52840 express and trinket m0
in English and zh_Latn_pinyin; ran and verified the localized
messages such as
Àn xià rènhé jiàn jìnrù REPL. Shǐyòng CTRL-D chóngxīn jiāzài.
and
Press any key to enter the REPL. Use CTRL-D to reload.
were properly displayed.