translations: document the compressed format

This commit is contained in:
Jeff Epler 2020-05-28 11:29:28 -05:00
parent fe3e8d1589
commit d0f9b5901e
2 changed files with 30 additions and 0 deletions

View File

@ -2,6 +2,9 @@
Process raw qstr file and output qstr data with length, hash and data bytes.
This script works with Python 2.7, 3.3 and 3.4.
For documentation about the format of compressed translated strings, see
supervisor/shared/translate.h
"""
from __future__ import print_function

View File

@ -29,11 +29,38 @@
#include <stdint.h>
// The format of the compressed data is:
// - the size of the uncompressed string in UTF-8 bytes, encoded as a
// (compress_max_length_bits)-bit number. compress_max_length_bits is
// computed during dictionary generation time, and happens to be 8
// for all current platforms. However, it'll probably end up being
// 9 in some translations sometime in the future. This length excludes
// the trailing NUL, though notably decompress_length includes it.
//
// - followed by the huffman encoding of the individual UTF-16 code
// points that make up the string. The trailing "\0" is not
// represented by a huffman code, but is implied by the length.
// (building the huffman encoding on UTF-16 code points gave better
// compression than building it on UTF-8 bytes)
//
// The "data" / "tail" construct is so that the struct's last member is a
// "flexible array". However, the _only_ member is not permitted to be
// a flexible member, so we have to declare the first byte as a separte
// member of the structure.
//
// For translations where length needs 8 bits, this saves about 1.5
// bytes per string on average compared to a structure of {uint16_t,
// flexible array}, but is also future-proofed against strings with
// UTF-8 length above 256, with a savings of about 1.375 bytes per
// string.
typedef struct {
uint8_t data;
const uint8_t tail[];
} compressed_string_t;
// Return the compressed, translated version of a source string
// Usually, due to LTO, this is optimized into a load of a constant
// pointer.
const compressed_string_t* translate(const char* c);
void serial_write_compressed(const compressed_string_t* compressed);
char* decompress(const compressed_string_t* compressed, char* decompressed);