Only check the savings if a word occurs at least twice

Profiling shows that `est_net_savings` is one of the highest costs of
the whole process. Approximately, you can save storage only if a word
appears more than once, and doing this greatly reduces the number
of `est_net_savings` calls. Locally, it reduces the time for this
specific build step by 50% on ports/unix coverage build, without
affecting the size of the generated binary.
This commit is contained in:
Jeff Epler 2022-06-02 21:24:56 +02:00
parent 3ff7ed75c6
commit 4f27337207
No known key found for this signature in database
GPG Key ID: D5BF15AB975AB4DE

View File

@ -400,7 +400,8 @@ def compute_huffman_coding(translations, compression_filename):
# words[] array.
scores = sorted(
((s, -est_net_savings(s, occ)) for (s, occ) in counter.items()), key=lambda x: x[1]
((s, -est_net_savings(s, occ)) for (s, occ) in counter.items() if occ > 1),
key=lambda x: x[1],
)
# Pick the one with the highest score. The score must be negative.