* The new nonstandard '%S' format takes a pointer to compressed_string_t
and prints it
* The new mp_cprintf and mp_vcprintf take a format string that is a
compressed_string_t
By storing "count of words by length", the long `wends` table can be
replaced with a short `wlencount` table. This saves flash storage space.
Extend the range of string lengths that can be in the dictionary.
Originally it was to 2 to 9; at one point it was changed to 3 to 9.
Putting the lower bound back at 2 has a positive impact on the French
translation (a bunch of them, such as "ch", "\r\n", "%q", are used).
Increasing the maximum length gets 'mpossible', ' doit être ',
and 'CircuitPyth' at the long end. This adds a bit of processing time
to makeqstrdata. The specific 2/11 values are again empirical based on
the French translation on the adafruit_proxlight_trinkey_m0.
I was puzzled by why the dictionary words were sorted by length.
It was because TextSplitter sorted its parameter, instead of a copy.
This doesn't affect encoding size, but does affect the encoding NUMBER
of the found words. We'll deliberately restore sorting by length next,
for other reasons, but not by spooky action.
Try to accurately measure the costs of including a word in the dictionary
vs the gains from using it in messages.
This saves about 160 bytes on trinket_m0 ja, the fullest translation
for that board. Other translations on the same board all have savings,
ranging from 24 to 228 bytes.
```
Translation Before After Savings
ja 1164 1324 160
de_DE 1260 1396 136
fr 1424 1652 228
zh_Latn_pinyin 1448 1520 72
pt_BR 1584 1736 152
pl 1592 1640 48
es 1724 1816 92
ko 1724 1816 92
fil 1764 1800 36
it_IT 1896 2040 144
nl 1956 2136 180
ID 2072 2180 108
cs 2124 2148 24
sv 2340 2448 108
en_x_pirate 2644 2740 96
en_GB 2652 2752 100
el 2656 2768 112
en_US 2656 2768 112
hi 2656 2768 112
```
By comparing the address of the initial 'name' field instead of the
addresses of the objects themselves, a small amount of type safety is
added back, vs just casting to void.
In the event that some other kind of object is passed in as 't',
which happens to have a 'name' field of the right type, the construct
would be (undesirably) accepted but it would almost certainly evaluate
to false at runtime.
Prior to this commit, cache flushing for ARM native code was done only in
the assembler code asm_thumb_end_pass()/asm_arm_end_pass(), at the last
pass of the assembler. But this misses flushing the cache when loading
native code from an .mpy file, ie in persistentcode.c.
The change here makes sure the cache is always flushed/cleaned/invalidated
when assigning native code on ARM architectures.
This problem was found running tests/micropython/import_mpy_native_gc.py on
the mimxrt port.
Signed-off-by: Damien George <damien@micropython.org>
asan considers that memcmp(p, q, N) is permitted to access N bytes at each
of p and q, even for values of p and q that have a difference earlier.
Accessing additional values is frequently done in practice, reading 4 or
more bytes from each input at a time for efficiency, so when completing
"non_exist<TAB>" in the repl, this causes a diagnostic:
==16938==ERROR: AddressSanitizer: global-buffer-overflow on
address 0x555555cd8dc8 at pc 0x7ffff726457b bp 0x7fffffffda20 sp 0x7fff
READ of size 9 at 0x555555cd8dc8 thread T0
#0 0x7ffff726457a (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xb857a)
#1 0x555555b0e82a in mp_repl_autocomplete ../../py/repl.c:301
#2 0x555555c89585 in readline_process_char ../../lib/mp-readline/re
#3 0x555555c8ac6e in readline ../../lib/mp-readline/readline.c:513
#4 0x555555b8dcbd in do_repl /home/jepler/src/micropython/ports/uni
#5 0x555555b90859 in main_ /home/jepler/src/micropython/ports/unix/
#6 0x555555b90a3a in main /home/jepler/src/micropython/ports/unix/m
#7 0x7ffff619a09a in __libc_start_main ../csu/libc-start.c:308
#8 0x55555595fd69 in _start (/home/jepler/src/micropython/ports/uni
0x555555cd8dc8 is located 0 bytes to the right of global variable
'import_str' defined in '../../py/repl.c:285:23' (0x555555cd8dc0) of
size 8
'import_str' is ascii string 'import '
Signed-off-by: Jeff Epler <jepler@gmail.com>
The proper way to do this is to test for __APPLE__ and __MACH__, where
__APPLE__ tests for an Apple OS and __MACH__ tests that it is based on CMU
Mach. Using both tests ensures that just Darwin (Apple's open source base
for MacOS, iOS, etc.) is recognized. __APPLE__ by itself will test for any
Apple OS, which can include older OS 7-9 and any future Apple OS. __MACH__
tests for any OS based on CMU Mach, including Darwin and GNU Hurd.
Fixes#7232.
Array equality is defined as each element being equal but to keep
code size down MicroPython implements a binary comparison. This
can only be used correctly for elements with the same binary layout
though so turn it into an NotImplementedError when comparing types
for which the binary comparison yielded incorrect results: types
with different sizes, and floating point numbers because nan != nan.