The proper way to do this is to test for __APPLE__ and __MACH__, where
__APPLE__ tests for an Apple OS and __MACH__ tests that it is based on CMU
Mach. Using both tests ensures that just Darwin (Apple's open source base
for MacOS, iOS, etc.) is recognized. __APPLE__ by itself will test for any
Apple OS, which can include older OS 7-9 and any future Apple OS. __MACH__
tests for any OS based on CMU Mach, including Darwin and GNU Hurd.
Fixes#7232.
Array equality is defined as each element being equal but to keep
code size down MicroPython implements a binary comparison. This
can only be used correctly for elements with the same binary layout
though so turn it into an NotImplementedError when comparing types
for which the binary comparison yielded incorrect results: types
with different sizes, and floating point numbers because nan != nan.
This commit makes gc_lock_depth have one counter per thread, instead of one
global counter. This makes threads properly independent with respect to
the GC, in particular threads can now independently lock the GC for
themselves without locking it for other threads. It also means a given
thread can run a hard IRQ without temporarily locking the GC for all other
threads and potentially making them have MemoryError exceptions at random
locations (this really only occurs on MCUs with multiple cores and no GIL,
eg on the rp2 port).
The commit also removes protection of the GC lock/unlock functions, which
is no longer needed when the counter is per thread (and this also fixes the
cas where a hard IRQ calling gc_lock() may stall waiting for the mutex).
It also puts the check for `gc_lock_depth > 0` outside the GC mutex in
gc_alloc, gc_realloc and gc_free, to potentially prevent a hard IRQ from
waiting on a mutex if it does attempt to allocate heap memory (and putting
the check outside the GC mutex is now safe now that there is a
gc_lock_depth per thread).
Signed-off-by: Damien George <damien@micropython.org>
Doing "import <tab>" will now complete/list built-in modules.
Originally at adafruit#4548 and adafruit#4608
Signed-off-by: Artyom Skrobov <tyomitch@gmail.com>
Anything beginning with "_" will now only be tab-completed if there is
already a partial match for such an entry. In other words, entering
foo.<tab> will no longer complete/list anything beginning with "_".
Originally at adafruit#1850
Signed-off-by: Kathryn Lingel <kathryn@lingel.net>
These commented-out lines of code have been unused for a long time, so
remove them to avoid confusion as to why they are there.
mp_obj_dict_free() never existed, this line was converted from
mp_map_deinit() and commented out as soon as it was added. The call to
mp_map_deinit(mp_loaded_modules_map) was commented in
1a1d11fa32.
Fixes issue #3507.
Signed-off-by: Damien George <damien@micropython.org>
This helper is added to properly set a pending exception, to mirror
mp_sched_schedule(), which schedules a function.
Signed-off-by: Damien George <damien@micropython.org>
If MICROPY_ENABLE_SCHEDULER is enabled then MP_STATE_VM(sched_state) must
be updated after handling the pending exception, which is done by the
mp_handle_pending() function.
Signed-off-by: Damien George <damien@micropython.org>
This introduces a new option, MICROPY_ERROR_REPORTING_NONE, which
completely disables all error messages. To be used in cases where
MicroPython needs to fit in very limited systems.
Signed-off-by: Damien George <damien@micropython.org>
This commit adds the errno attribute to exceptions, so code can retrieve
errno codes from an OSError using exc.errno.
The implementation here simply lets `errno` (and the existing `value`)
attributes work on any exception instance (they both alias args[0]). This
is for efficiency and to keep code size down. The pros and cons of this
are:
Pros:
- more compatible with CPython, less difference to document and learn
- OSError().errno will correctly return None, whereas the current way of
doing it via OSError().args[0] will raise an IndexError
- it reduces code size on most bare-metal ports (because they already have
the errno qstr)
- for Python code that uses exc.errno the generated bytecode is 2 bytes
smaller and more efficient to execute (compared with exc.args[0]); so
bytecode loaded to RAM saves 2 bytes RAM for each use of this attribute,
and bytecode that is frozen saves 2 bytes flash/ROM for each use
- it's easier/shorter to type, and saves 2 bytes of space in .py files that
use it (for each use)
Cons:
- increases code size by 4-8 bytes on minimal ports that don't already have
the `errno` qstr
- all exceptions now have .errno and .value attributes (a cpydiff test is
added to address this)
See also #2407.
Signed-off-by: Damien George <damien@micropython.org>
This allows configuring the pre-allocated size of sys.modules dict, in
order to prevent unwanted reallocations at run-time (3 sys-modules is
really not quite enough for a larger project).
When building with STATIC undefined (e.g., -DSTATIC=), there are two
instances of mp_type_code that collide at link time: in profile.c and in
builtinevex.c. This patch resolves the collision by renaming one of them.
The parts that are generic are added to py/ so they can be used by other
ports that use CMake.
py/usermod.cmake:
* Creates a usermod target to hang user C/CXX modules from.
* Gathers sources from user C/CXX modules and libs for QSTR scan.
ports/rp2/CMakeLists.txt:
* Includes py/usermod.cmake.
* Links the resulting usermod library to the MicroPython target.
py/mkrules.cmake:
Add cxxflags to qstr.i.last custom command for CXX modules:
* MICROPY_CPP_FLAGS so CXX modules will find includes.
* -DNO_QSTR to fix fatal error missing "genhdr/qstrdefs.generated.h".
Usage:
The rp2 port can be linked against user C modules by running:
make USER_C_MODULES=/path/to/module/micropython.cmake
CMake will print a list of included modules.
Co-authored-by: Graham Sanderson <graham.sanderson@raspberrypi.org>
Co-authored-by: Michael O'Cleirigh <michael.ocleirigh@rivulet.ca>
Signed-off-by: Phil Howard <phil@pimoroni.com>
mp_printf should be used to print the prefix because it's also used in
mp_bytecode_print2 (otherwise, depending on the system, different output
streams may be used).
Also print the current thread state when threading is enabled to easily see
which thread executes what opcode.
Signed-off-by: Damien George <damien@micropython.org>
This allows a port to specify a custom qstrdefsport.h file, the same as the
QSTR_DEFS variable in a Makefile.
Signed-off-by: Damien George <damien@micropython.org>
The core cmake rules use custom commands to invoke qstr processing
scripts. For the zephyr port, it's possible that list arguments to these
commands may contain generator expressions, therefore we need to expand
them properly.
Signed-off-by: Maureen Helm <maureen.helm@nxp.com>
For certain operands to mpn_div, the existing code path for
`DIG_SIZE == MPZ_DBL_DIG_SIZE / 2` had a bug in it where borrow could still
overflow in the `(x >= *n || *n - x <= borrow)` branch, ie
`borrow + x - (mpz_dbl_dig_t)*n` overflows the borrow variable. In such
cases the subsequent right-shift of borrow would not bring in the overflow
bit, leading to an error in the result. An example division that had
overflow when MPZ_DIG_SIZE = 16 is `(2 ** 48 - 1) ** 2 // (2 ** 48 - 1)`.
This is fixed in this commit by simplifying the code and handling the low
digits of borrow first, and then the upper bits (to shift down) separately.
There is no longer a distinction between `DIG_SIZE < MPZ_DBL_DIG_SIZE / 2`
and `DIG_SIZE == MPZ_DBL_DIG_SIZE / 2`.
This commit also simplifies the second part of the calculation so that
borrow does not need to be negated (instead the code just works knowing
that borrow is negative and using + instead of - in calculations involving
borrow).
Fixes#6777.
Signed-off-by: Damien George <damien@micropython.org>
The "word" referred to by BYTES_PER_WORD is actually the size of mp_obj_t
which is not always the same as the size of a pointer on the target
architecture. So rename this config value to better reflect what it
measures, and also prefix it with MP_.
For uses of BYTES_PER_WORD in setting the stack limit this has been
changed to sizeof(void *), because the stack usually grows with
machine-word sized values (eg an nlr_buf_t has many machine words in it).
Signed-off-by: Damien George <damien@micropython.org>
It's only used in one location, to test if << or >> will overflow when
shifting mp_uint_t. For such a test it's clearer to use sizeof(lhs_val),
which will be valid even if the type of lhs_val changes.
Signed-off-by: Damien George <damien@micropython.org>
This environment variable, if defined during the build process,
indicates a fixed time that should be used in place of "now" when
such a time is explicitely referenced.
This allows for reproducible builds of micropython.
See https://reproducible-builds.org/specs/source-date-epoch/
Signed-off-by: iTitou <moiandme@gmail.com>
This should be enabled when the mp_raw_code_save_file function is needed.
It is enabled for mpy-cross, and a check for defined(__APPLE__) is added to
cover Mac M1 systems.
It practically does the same as qstr_from_str and was only used in one
place, which should actually use the compile-time MP_QSTR_XXX form for
consistency; qstr_from_str is for runtime strings only.
Adds a new compile-time option MICROPY_EMIT_THUMB_ARMV7M which is enabled
by default (to get existing behaviour) and which should be disabled (set to
0) when building native emitter support (@micropython.native) on ARMv6M
targets.
This returns a reference to the globals dict associated with the function,
ie the global scope that the function was defined in. This attribute is
read-only but the dict itself is modifiable, per CPython behaviour.
Signed-off-by: Damien George <damien@micropython.org>
As a general pattern, required positional arguments that are not named do
not need to be parsed using mp_arg_parse_all().
Signed-off-by: Damien George <damien@micropython.org>
Two issues are tackled:
1. The calculation of the correct length to print is fixed to treat the
precision as a maximum length instead as the exact length.
This is done for both qstr (%q) and for regular str (%s).
2. Fix the incorrect use of mp_printf("%.*s") to mp_print_strn().
Because of the fix of above issue, some testcases that would print
an embedded null-byte (^@ in test-output) would now fail.
The bug here is that "%s" was used to print null-bytes. Instead,
mp_print_strn is used to make sure all bytes are outputted and the
exact length is respected.
Test-cases are added for both %s and %q with a combination of precision
and padding specifiers.
Also known as L2CAP "connection oriented channels". This provides a
socket-like data transfer mechanism for BLE.
Currently only implemented for NimBLE on STM32 / Unix.
Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This gives a substantial speedup of the preprocessing step, i.e. the
generation of qstr.i.last. For example on a clean build, making
qstr.i.last:
21s -> 4s on STM32 (WB55)
8.9 -> 1.8s on Unix (dev).
Done in collaboration with @stinos.
Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
Support C++ code in .cpp files by providing CXX counterparts of the
_USERMOD_ flags we have for C already. This merely enables the Makefile of
user C modules to use variables specific to C++ compilation, it is still up
to each port's main Makefile to also include these in the build.
When SCR_QSTR contains C++ files they should be preprocessed with the same
compiler flags (CXXFLAGS) as they will be compiled with, to make sure code
scanned for QSTR occurrences is effectively the code used in the rest of
the build. The 'split SCR_QSTR in .c and .cpp files and process each with
different flags' logic isn't trivial to express in a Makefile and the
existing principle for deciding which files to preprocess was already
rather complicated, so the actual preprocessing is moved into
makeqstrdefs.py completely.
When process_file() is passed a preprocessed C++ file for instance it won't
find any lines containing .c files and the last_fname variable remains
None, so handle that gracefully.
Newer GCC versions are able to warn about switch cases that fall
through. This is usually a sign of a forgotten break statement, but in
the few cases where a fall through is intended we annotate it with this
macro to avoid the warning.
Like Clang, GCC warns about this file, but only with -Woverride-init
which is enabled by -Wextra. Disable the warnings for this file just
like we do for Clang to make -Wextra happy.
When compiling with -Wextra which includes -Wmissing-field-initializers
GCC will warn that the defval field of mp_arg_val_t is not initialized.
This is just a warning as it is defined to be zero initialized, but since
it is a union it makes sense to be explicit about which member we're
going to use, so add the explicit initializers and get rid of the
warning.
On x86 chars are signed, but we're comparing a char to '0' + unsigned int,
which is promoted to an unsigned int. Let's promote the char to unsigned
before doing the comparison to avoid weird corner cases.
The function scope_find_or_add_id used to take a scope_kind_t enum and
save it in an uint8_t. Saving an enum in a uint8_t is fine, but
everywhere this function is called it is not actually given a
scope_kind_t but an anonymous enum instead. Let's give this enum a name
and use that as the argument type.
This doesn't change the generated code, but is a C type mismatch that
unfortunately doesn't show up unless you enable -Wenum-conversion.
Some downstream projects may use tags in their repositories for more than
just designating MicroPython releases. In those cases, the
makeversionhdr.py script would end up using a different tag than intended.
So tell `git describe` to only match tags that look like a MicroPython
version tag, such as `v1.12` or `v2.0`.
Calling the bytes constructor on a bytes object returns the original bytes
object. This saves allocating a new instance, and matches CPython.
Signed-off-by: Iyassou Shimels <s.iyassou@gmail.com>
For time-based functions that work with absolute time there is the need for
an Epoch, to set the zero-point at which the absolute time starts counting.
Such functions include time.time() and filesystem stat return values. And
different ports may use a different Epoch.
To make it clearer what functions use the Epoch (whatever it may be), and
make the ports more consistent with their use of the Epoch, this commit
renames all Epoch related functions to include the word "epoch" in their
name (and remove references to "2000").
Along with this rename, the following things have changed:
- mp_hal_time_ns() is now specified to return the number of nanoseconds
since the Epoch, rather than since 1970 (but since this is an internal
function it doesn't change anything for the user).
- littlefs timestamps on the esp8266 have been fixed (they were previously
off by 30 years in nanoseconds).
Otherwise, there is no functional change made by this commit.
Signed-off-by: Damien George <damien@micropython.org>
Prior to this commit, pow(-2, float('nan')) would return (nan+nanj), or
raise an exception on targets that don't support complex numbers. This is
fixed to return simply nan, as CPython does.
Signed-off-by: Damien George <damien@micropython.org>
This is consistent with the other 'micro' modules and allows implementing
additional features in Python via e.g. micropython-lib's sys.
Note this is a breaking change (not backwards compatible) for ports which
do not enable weak links, as "import sys" must now be replaced with
"import usys".
Updating to Black v20.8b1 there are two changes that affect the code in
this repository:
- If there is a trailing comma in a list (eg [], () or function call) then
that list is now written out with one line per element. So remove such
trailing commas where the list should stay on one line.
- Spaces at the start of """ doc strings are removed.
Signed-off-by: Damien George <damien@micropython.org>
As per CPython behaviour, compile(stmt, "file", "single") should create
code which prints to stdout (via __repl_print__) the results of any
expressions in stmt.
Signed-off-by: Damien George <damien@micropython.org>
On ports where normal heap memory can contain executable code (eg ARM-based
ports such as stm32), native code loaded from an .mpy file may be reclaimed
by the GC because there's no reference to the very start of the native
machine code block that is reachable from root pointers (only pointers to
internal parts of the machine code block are reachable, but that doesn't
help the GC find the memory).
This commit fixes this issue by maintaining an explicit list of root
pointers pointing to native code that is loaded from an .mpy file. This
is not needed for all ports so is selectable by the new configuration
option MICROPY_PERSISTENT_CODE_TRACK_RELOC_CODE. It's enabled by default
if a port does not specify any special functions to allocate or commit
executable memory.
A test is included to test that native code loaded from an .mpy file does
not get reclaimed by the GC.
Fixes#6045.
Signed-off-by: Damien George <damien@micropython.org>
MicroPython's original implementation of __aiter__ was correct for an
earlier (provisional) version of PEP492 (CPython 3.5), where __aiter__ was
an async-def function. But that changed in the final version of PEP492 (in
CPython 3.5.2) where the function was changed to a normal one. See
https://www.python.org/dev/peps/pep-0492/#why-aiter-does-not-return-an-awaitable
See also the note at the end of this subsection in the docs:
https://docs.python.org/3.5/reference/datamodel.html#asynchronous-iterators
And for completeness the BPO: https://bugs.python.org/issue27243
To be consistent with the Python spec as it stands today (and now that
PEP492 is final) this commit changes MicroPython's behaviour to match
CPython: __aiter__ should return an async-iterable object, but is not
itself awaitable.
The relevant tests are updated to match.
See #6267.
Because the argument arrays may overlap, as show by the new tests in this
commit.
Also remove the debugging comments for these macros, add a new comment
about overlapping regions, and separate the macros by blank lines to make
them easier to read.
Fixes issue #6244.
Signed-off-by: Damien George <damien@micropython.org>
This commit fixes lookups of class members to make it so that built-in
functions that are used as methods/functions of a class work correctly.
The mp_convert_member_lookup() function is pretty much completely changed
by this commit, but for the most part it's just reorganised and the
indenting changed. The functional changes are:
- staticmethod and classmethod checks moved to later in the if-logic,
because they are less common and so should be checked after the more
common cases.
- The explicit mp_obj_is_type(member, &mp_type_type) check is removed
because it's now subsumed by other, more general tests in this function.
- MP_TYPE_FLAG_BINDS_SELF and MP_TYPE_FLAG_BUILTIN_FUN type flags added to
make the checks in this function much simpler (now they just test this
bit in type->flags).
- An extra check is made for mp_obj_is_instance_type(type) to fix lookup of
built-in functions.
Fixes#1326 and #6198.
Signed-off-by: Damien George <damien@micropython.org>
This allows complex binary operations to fail gracefully with unsupported
operation rather than raising an exception, so that special methods work
correctly.
Signed-off-by: Damien George <damien@micropython.org>
uint types in viper mode can now be used for all binary operators except
floor-divide and modulo.
Fixes issue #1847 and issue #6177.
Signed-off-by: Damien George <damien@micropython.org>
An OrderedDict can now be used for the locals when creating a type
explicitly via type(name, bases, locals).
Signed-off-by: Damien George <damien@micropython.org>
This addition to the grammar was introduced in Python 3.6. It allows
annotating the type of a varilable, like:
x: int = 123
s: str
The implementation in this commit is quite simple and just ignores the
annotation (the int and str bits above). The reason to implement this is
to allow Python 3.6+ code that uses this feature to compile under
MicroPython without change, and for users to use type checkers.
In the future viper could use this syntax as a way to give types to
variables, which is currently done in a bit of an ad-hoc way, eg
x = int(123). And this syntax could potentially be used in the inline
assembler to define labels in an way that's easier to read.
The syntax matches CPython and the semantics are equivalent except that,
unlike CPython, MicroPython allows using := to assign to comprehension
iteration variables, because disallowing this would take a lot of code to
check for it.
The new compile-time option MICROPY_PY_ASSIGN_EXPR selects this feature and
is enabled by default, following MICROPY_PY_ASYNC_AWAIT.
Formatting for `* sizeof` was fixed in uncrustify v0.71, so we no longer
need the fixups for it. Also, there was one file where the updated
uncrustify caught a problem that the regex didn't pick up, which is updated
in this commit.
Signed-off-by: David Lechner <david@pybricks.com>
Long ago, prior to 0ef01d0a75, fixed and
ordered maps were the same setting with the "table_is_fixed_array" member
of mp_map_t. But these settings are actually independent, and it is
possible to have is_fixed=1, is_ordered=0 (although this can currently
only be done by tools/cc1). So update the comments to reflect this.
The resulting dict is now marked as read-only (is_fixed=1) to enforce the
fact that changes to this dict will not be reflected in the class instance.
This commit reduces code size by about 20 bytes, and should be more
efficient because it creates a direct copy of the dict rather than
reinserting all elements.
The behavior mirrors the instance object dict attribute where a copy of the
local attributes are provided (unless the dict is read-only, then that dict
itself is returned, as an optimisation). MicroPython does not support
modifying this dict because the changes will not be reflected in the class.
The feature is only enabled if MICROPY_CPYTHON_COMPAT is set, the same as
the instance version.
There doesn't appear to be any use for only triggering on specific events,
so it's just easier to number them sequentially. This makes them smaller
values so they take up only 1 byte in the ringbuf, only 1 byte for the
opcode in the bytecode, and makes room for more events.
Also add a couple of new event types that need to be implemented (to avoid
re-numbering later).
And rename _COMPLETE and _STATUS to _DONE for consistency.
In the future the "trigger" keyword argument can be reinstated by requiring
the user to compute the bitmask, eg:
ble.irq(handler, 1 << _IRQ_SCAN_RESULT | 1 << _IRQ_SCAN_DONE)
Older implementations deal with infinity/negative zero incorrectly. This
commit adds generic fixes that can be enabled by any port that needs them,
along with new tests cases.
Otherwise functions like memset might get optimised to call themselves (eg
with gcc 10). And provide CFLAGS_BUILTIN so these options can be changed
by a port if needed.
Fixes issue #6053.
Constant expression like "2 ** 3" will now be folded, and the special form
"X = const(2 ** 3)" will now compile because the argument to the const is
now a constant.
Fixes issue #5865.
This allows user code that inherits from uio.IOBase to return an errno
error code from the user readinto/write function, by returning a negative
value. Eg returning -123 means an errno of 123. This is already how the
custom ioctl works.
This change is made for two reasons:
1. A 3rd-party library (eg berkeley-db-1.xx, axtls) may use the system
provided errno for certain errors, and yet MicroPython stream objects
that it calls will be using the internal mp_stream_errno. So if the
library returns an error it is not known whether the corresponding errno
code is stored in the system errno or mp_stream_errno. Using the system
errno in all cases (eg in the mp_stream_posix_XXX wrappers) fixes this
ambiguity.
2. For systems that have threading the system-provided errno should always
be used because the errno value is thread-local.
For systems that do not have an errno, the new lib/embed/__errno.c file is
provided.
Note: the uncrustify configuration is explicitly set to 'add' instead of
'force' in order not to alter the comments which use extra spaces after //
as a means of indenting text for clarity.
Error string compression is not deterministic in certain cases: it depends
on the Python version (whether dicts are ordered by default or not) and
probably also the order files are passed to this script, leading to a
difference in which words are included in the top 128 most common.
The changes in this commit use OrderedDict to keep parsed lines in a known
order, and, when computing how many bytes are saved by a given word, it
uses the word itself to break ties (which would otherwise be "random").
For combinations of certain versions of glibc and gcc the definition of
fpclassify always takes float as argument instead of adapting itself to
float/double/long double as required by the C99 standard. At the time of
writing this happens for instance for glibc 2.27 with gcc 7.5.0 when
compiled with -Os and glibc 3.0.7 with gcc 9.3.0. When calling fpclassify
with double as argument, as in objint.c, this results in an implicit
narrowing conversion which is not really correct plus results in a warning
when compiled with -Wfloat-conversion. So fix this by spelling out the
logic manually.
Initially some of these were found building the unix coverage variant on
MacOS because that build uses clang and has -Wdouble-promotion enabled, and
clang performs more vigorous promotion checks than gcc. Additionally the
codebase has been compiled with clang and msvc (the latter with warning
level 3), and with MICROPY_FLOAT_IMPL_FLOAT to find the rest of the
conversions.
Fixes are implemented either as explicit casts, or by using the correct
type, or by using one of the utility functions to handle floating point
casting; these have been moved from nativeglue.c to the public API.
This commit provides a typedef for mp_rom_error_text_t, and a macro define
for MP_COMPRESSED_ROM_TEXT, when MICROPY_ROM_TEXT_COMPRESSION is disabled.
This simplifies the configuration (it no longer has a special case for
MICROPY_ENABLE_DYNRUNTIME) and makes it work for other cases that don't use
compression (eg examples/embedding). This commit also ensures
MICROPY_ROM_TEXT_COMPRESSION is defined during qstr processing.
Now that error string compression is supported it's more important to have
consistent error string formatting (eg all lowercase English words,
consistent contractions). This commit cleans up some of the strings to
make them more consistent.
Because the atomic section starts after checking whether the scheduler
state is pending, it's possible it can become a different state by the time
the atomic section starts.
This is especially likely on ports where MICROPY_BEGIN_ATOMIC_SECTION is
implemented with a mutex (i.e. it might block), but the race exists
regardless, i.e. if a context switch occurs between those two lines.
TimeoutError was added back in 077812b2ab for
the cc3200 port. In f522849a4d the cc3200
port enabled use of it in the socket module aliased to socket.timeout. So
it was never added to the builtins. Then it was replaced by
OSError(ETIMEDOUT) in 047af9b10b.
The esp32 port enables this exception, since the very beginning of that
port, but it could never be accessed because it's not in builtins.
It's being removed: 1) to not encourage its use; 2) because there are a lot
of other OSError subclasses which are not defined at all, and having
TimeoutError is a bit inconsistent.
Note that ports can add anything to the builtins via MICROPY_PORT_BUILTINS.
And they can also define their own exceptions using the
MP_DEFINE_EXCEPTION() macro.
In this part of the code there is no way to get the ** operator, so no need
to check for it.
This commit also adds tests for this, and other related, invalid const
operations.
The decompression of error-strings is only done if the string is accessed
via printing or via er.args. Tests are added for this feature to ensure
the decompression works.
The idea here is that there's a moderate amount of ROM used up by exception
text. Obviously we try to keep the messages short, and the code can enable
terse errors, but it still adds up. Listed below is the total string data
size for various ports:
bare-arm 2860
minimal 2876
stm32 8926 (PYBV11)
cc3200 3751
esp32 5721
This commit implements compression of these strings. It takes advantage of
the fact that these strings are all 7-bit ascii and extracts the top 128
frequently used words from the messages and stores them packed (dropping
their null-terminator), then uses (0x80 | index) inside strings to refer to
these common words. Spaces are automatically added around words, saving
more bytes. This happens transparently in the build process, mirroring the
steps that are used to generate the QSTR data. The MP_COMPRESSED_ROM_TEXT
macro wraps any literal string that should compressed, and it's
automatically decompressed in mp_decompress_rom_string.
There are many schemes that could be used for the compression, and some are
included in py/makecompresseddata.py for reference (space, Huffman, ngram,
common word). Results showed that the common-word compression gets better
results. This is before counting the increased cost of the Huffman
decoder. This might be slightly counter-intuitive, but this data is
extremely repetitive at a word-level, and the byte-level entropy coder
can't quite exploit that as efficiently. Ideally one would combine both
approaches, but for now the common-word approach is the one that is used.
For additional comparison, the size of the raw data compressed with gzip
and zlib is calculated, as a sort of proxy for a lower entropy bound. With
this scheme we come within 15% on stm32, and 30% on bare-arm (i.e. we use
x% more bytes than the data compressed with gzip -- not counting the code
overhead of a decoder, and how this would be hypothetically implemented).
The feature is disabled by default and can be enabled by setting
MICROPY_ROM_TEXT_COMPRESSION at the Makefile-level.