add explanation for newer compression features
This commit is contained in:
parent
4d8b354c13
commit
5c23e28208
|
@ -53,6 +53,13 @@
|
||||||
// speaking, words. They're just spans of code points that frequently
|
// speaking, words. They're just spans of code points that frequently
|
||||||
// occur together. They are ordered shortest to longest.
|
// occur together. They are ordered shortest to longest.
|
||||||
//
|
//
|
||||||
|
// - If the translation uses a lot of code points or widely spaced code points,
|
||||||
|
// then the huffman table entries are UTF-16 code points. But if the translation
|
||||||
|
// uses only ASCII 7-bit code points plus a SMALL range of higher code points that
|
||||||
|
// still fit in 8 bits, translation_offset and translation_offstart are used to
|
||||||
|
// renumber the code points so that they still fit within 8 bits. (it's very beneficial
|
||||||
|
// for mchar_t to be 8 bits instead of 16!)
|
||||||
|
//
|
||||||
// - dictionary entries are non-overlapping, and the _ending_ index of each
|
// - dictionary entries are non-overlapping, and the _ending_ index of each
|
||||||
// entry is stored in an array. A count of words of each length, from
|
// entry is stored in an array. A count of words of each length, from
|
||||||
// minlen to maxlen, is given in the array called wlencount. From
|
// minlen to maxlen, is given in the array called wlencount. From
|
||||||
|
@ -60,6 +67,14 @@
|
||||||
// calculated by an efficient, small loop. (A bit of time is traded
|
// calculated by an efficient, small loop. (A bit of time is traded
|
||||||
// to reduce the size of this table indicating lengths)
|
// to reduce the size of this table indicating lengths)
|
||||||
//
|
//
|
||||||
|
// - Value 1 ('\1') is used to indicate that a QSTR number follows. the
|
||||||
|
// QSTR is encoded as a fixed number of bits (translation_qstr_bits), e.g.,
|
||||||
|
// 10 bits if the highest core qstr is from 512 to 1023 inclusive.
|
||||||
|
// (maketranslationdata uses a simple heuristic where any qstr >= 3
|
||||||
|
// characters long is encoded in this way; this is simple but probably not
|
||||||
|
// optimal. In fact, the rule of >= 2 characters is better for SOME languages
|
||||||
|
// on SOME boards.)
|
||||||
|
//
|
||||||
// The "data" / "tail" construct is so that the struct's last member is a
|
// The "data" / "tail" construct is so that the struct's last member is a
|
||||||
// "flexible array". However, the _only_ member is not permitted to be
|
// "flexible array". However, the _only_ member is not permitted to be
|
||||||
// a flexible member, so we have to declare the first byte as a separate
|
// a flexible member, so we have to declare the first byte as a separate
|
||||||
|
|
Loading…
Reference in New Issue