translations don't always use unicode code points now

2023-08-10 09:22:10 -05:00 · 2023-08-10 09:22:10 -05:00 · 17015b48ad
commit 17015b48ad
parent 337b800ceb
1 changed files with 7 additions and 1 deletions
--- a/supervisor/shared/translate/compressed_string.h
+++ b/supervisor/shared/translate/compressed_string.h
@ -38,12 +38,18 @@
 //   9 in some translations sometime in the future.  This length excludes
 //   the trailing NUL, though notably decompress_length includes it.
 //
-// - followed by the huffman encoding of the individual UTF-16 code
+// - followed by the huffman encoding of the individual code
 //   points that make up the string.  The trailing "\0" is not
 //   represented by a huffman code, but is implied by the length.
 //   (building the huffman encoding on UTF-16 code points gave better
 //   compression than building it on UTF-8 bytes)
 //
+// - If possible, the code points are represented as uint8_t values, with
+//   0..127 representing themselves and 160..255 representing another range
+//   of Unicode, controlled by translation_offset and translation_offstart.
+//   If this is not possible, uint16_t values are used. At present, no translation
+//   requires code points not in the BMP, so this is adequate.
+//
 // - code points starting at 128 (word_start) and potentially extending
 //   to 255 (word_end) (but never interfering with the target
 //   language's used code points) stand for dictionary entries in a