TextSplitter: don't mutate 'words'
I was puzzled by why the dictionary words were sorted by length. It was because TextSplitter sorted its parameter, instead of a copy. This doesn't affect encoding size, but does affect the encoding NUMBER of the found words. We'll deliberately restore sorting by length next, for other reasons, but not by spooky action.
This commit is contained in:
parent
99abd03b7a
commit
8836198ff1
@ -280,7 +280,7 @@ def translate(translation_file, i18ns):
|
||||
|
||||
class TextSplitter:
|
||||
def __init__(self, words):
|
||||
words.sort(key=lambda x: len(x), reverse=True)
|
||||
words = sorted(words, key=lambda x: len(x), reverse=True)
|
||||
self.words = set(words)
|
||||
if words:
|
||||
pat = "|".join(re.escape(w) for w in words) + "|."
|
||||
|
Loading…
x
Reference in New Issue
Block a user