Fix decompression of unicode values above 2047

Two problems: The lead byte for 3-byte sequences was wrong, and one
mid-byte was not even filled in due to a missing "++"!

Apparently this was broken ever since the first "Compress as unicode,
not bytes" commit, but I believed I'd "tested" it by running on the
Pinyin translation.

This rendered at least the Korean and Japanese translations completely
illegible, affecting 5.0 and all later releases.
This commit is contained in:
Jeff Epler 2020-09-08 20:54:47 -05:00
parent 365d69e831
commit e82940697c

View File

@ -51,8 +51,8 @@ STATIC int put_utf8(char *buf, int u) {
*buf = 0b10000000 | (u & 0b00111111);
return 2;
} else { // u <= 0xffff)
*buf++ = 0b11000000 | (u >> 12);
*buf = 0b10000000 | ((u >> 6) & 0b00111111);
*buf++ = 0b11100000 | (u >> 12);
*buf++ = 0b10000000 | ((u >> 6) & 0b00111111);
*buf = 0b10000000 | (u & 0b00111111);
return 3;
}