Fix decompression of unicode values above 2047

Two problems: The lead byte for 3-byte sequences was wrong, and one
mid-byte was not even filled in due to a missing "++"!

Apparently this was broken ever since the first "Compress as unicode,
not bytes" commit, but I believed I'd "tested" it by running on the
Pinyin translation.

This rendered at least the Korean and Japanese translations completely
illegible, affecting 5.0 and all later releases.
This commit is contained in:
Jeff Epler 2020-09-08 20:54:47 -05:00
parent bdb07adfcc
commit 0eee93729a

View File

@ -59,8 +59,8 @@ STATIC int put_utf8(char *buf, int u) {
*buf = 0b10000000 | (u & 0b00111111);
return 2;
} else { // u <= 0xffff
*buf++ = 0b11000000 | (u >> 12);
*buf = 0b10000000 | ((u >> 6) & 0b00111111);
*buf++ = 0b11100000 | (u >> 12);
*buf++ = 0b10000000 | ((u >> 6) & 0b00111111);
*buf = 0b10000000 | (u & 0b00111111);
return 3;
}