makeqstrdata: permit longer "compressed" outputs

It is possible for this routine to expand some inputs, and in fact
it does for certan strings in the proposed Korean translation of
CircuitPython (#1858).  I did not determine what the maximum
expansion is -- it's probably modest, like len()/7+2 bytes or
something -- so I tried to just make enc[] an adequate
over-allocation, and then ensured that all the strings in the
proposed ko.po now worked.  The worst actual expansion seems to be a
string that goes from 65 UTF-8-encoded bytes to 68 compressed bytes
(+4.6%).  Only a few out of all strings are reported as
non-compressed.
This commit is contained in:
Jeff Epler 2019-08-06 07:38:49 -05:00
parent 95d2694bc3
commit c4f3a02b3b
1 changed files with 3 additions and 1 deletions

View File

@ -180,7 +180,7 @@ def compress(encoding_table, decompressed):
if not isinstance(decompressed, bytes):
raise TypeError()
values, lengths = encoding_table
enc = bytearray(len(decompressed))
enc = bytearray(len(decompressed) * 2)
#print(decompressed)
#print(lengths)
current_bit = 7
@ -227,6 +227,8 @@ def compress(encoding_table, decompressed):
current_bit -= 1
if current_bit != 7:
current_byte += 1
if current_byte > len(decompressed):
print("Note: compression increased length", repr(decompressed.decode('utf-8')), len(decompressed), current_byte, file=sys.stderr)
return enc[:current_byte]
def qstr_escape(qst):