add explanation for newer compression features
This commit is contained in:
parent
4d8b354c13
commit
5c23e28208
@ -53,6 +53,13 @@
|
||||
// speaking, words. They're just spans of code points that frequently
|
||||
// occur together. They are ordered shortest to longest.
|
||||
//
|
||||
// - If the translation uses a lot of code points or widely spaced code points,
|
||||
// then the huffman table entries are UTF-16 code points. But if the translation
|
||||
// uses only ASCII 7-bit code points plus a SMALL range of higher code points that
|
||||
// still fit in 8 bits, translation_offset and translation_offstart are used to
|
||||
// renumber the code points so that they still fit within 8 bits. (it's very beneficial
|
||||
// for mchar_t to be 8 bits instead of 16!)
|
||||
//
|
||||
// - dictionary entries are non-overlapping, and the _ending_ index of each
|
||||
// entry is stored in an array. A count of words of each length, from
|
||||
// minlen to maxlen, is given in the array called wlencount. From
|
||||
@ -60,6 +67,14 @@
|
||||
// calculated by an efficient, small loop. (A bit of time is traded
|
||||
// to reduce the size of this table indicating lengths)
|
||||
//
|
||||
// - Value 1 ('\1') is used to indicate that a QSTR number follows. the
|
||||
// QSTR is encoded as a fixed number of bits (translation_qstr_bits), e.g.,
|
||||
// 10 bits if the highest core qstr is from 512 to 1023 inclusive.
|
||||
// (maketranslationdata uses a simple heuristic where any qstr >= 3
|
||||
// characters long is encoded in this way; this is simple but probably not
|
||||
// optimal. In fact, the rule of >= 2 characters is better for SOME languages
|
||||
// on SOME boards.)
|
||||
//
|
||||
// The "data" / "tail" construct is so that the struct's last member is a
|
||||
// "flexible array". However, the _only_ member is not permitted to be
|
||||
// a flexible member, so we have to declare the first byte as a separate
|
||||
|
Loading…
Reference in New Issue
Block a user