TextSplitter: don't mutate 'words'

I was puzzled by why the dictionary words were sorted by length.
It was because TextSplitter sorted its parameter, instead of a copy.

This doesn't affect encoding size, but does affect the encoding NUMBER
of the found words.  We'll deliberately restore sorting by length next,
for other reasons, but not by spooky action.
This commit is contained in:
Jeff Epler 2021-02-03 17:18:47 -06:00
parent 99abd03b7a
commit 8836198ff1

View File

@ -280,7 +280,7 @@ def translate(translation_file, i18ns):
class TextSplitter:
def __init__(self, words):
words.sort(key=lambda x: len(x), reverse=True)
words = sorted(words, key=lambda x: len(x), reverse=True)
self.words = set(words)
if words:
pat = "|".join(re.escape(w) for w in words) + "|."