This allows the compiler to merge strings: e.g. "update",
"difference_update" and "symmetric_difference_update"
will all point to the same memory.
Shaves ~1KB off the image size, and potentially allows
bigger savings if qstr attrs are initialized in qstr_init(),
and not stored in the image.
The parts that are generic are added to py/ so they can be used by other
ports that use CMake.
py/usermod.cmake:
* Creates a usermod target to hang user C/CXX modules from.
* Gathers sources from user C/CXX modules and libs for QSTR scan.
ports/rp2/CMakeLists.txt:
* Includes py/usermod.cmake.
* Links the resulting usermod library to the MicroPython target.
py/mkrules.cmake:
Add cxxflags to qstr.i.last custom command for CXX modules:
* MICROPY_CPP_FLAGS so CXX modules will find includes.
* -DNO_QSTR to fix fatal error missing "genhdr/qstrdefs.generated.h".
Usage:
The rp2 port can be linked against user C modules by running:
make USER_C_MODULES=/path/to/module/micropython.cmake
CMake will print a list of included modules.
Co-authored-by: Graham Sanderson <graham.sanderson@raspberrypi.org>
Co-authored-by: Michael O'Cleirigh <michael.ocleirigh@rivulet.ca>
Signed-off-by: Phil Howard <phil@pimoroni.com>
mp_printf should be used to print the prefix because it's also used in
mp_bytecode_print2 (otherwise, depending on the system, different output
streams may be used).
Also print the current thread state when threading is enabled to easily see
which thread executes what opcode.
Signed-off-by: Damien George <damien@micropython.org>
This adds some additional code in mkfs which doesn't seem necessary, and
Disabling it saves 172 bytes flash.
Testing performed: Using a Feather M0 Adalogger, checked that
* an sdcard could still be mounted (using adafruit_sdcard)
* os.listdir() of "/" and "/sd" worked
* CIRCUITPY still mounted
This also adds a bit of code everywhere we DISPATCH(), but the net is
+232 bytes free on Feather M0 Adalogger.
Key assumption: All of the offsets in mp_execute_bytecode fit in 16 bits;
it is not clear whether the compiler will verify this assumption (e.g.,
by warning that a constant will be truncated)
* Always clear the peripheral interrupt so we don't hang when full
* Store the ringbuf in the object so it gets collected when we're alive
* Make UART objects have a finaliser so they are deinit when their
memory is freed
* Copy bytes into the ringbuf from the FIFO after we read to ensure
the interrupt is enabled ASAP
* Copy bytes into the ringbuf from the FIFO before measuring our
rx available because the interrupt is based on a threshold (not
> 0). For example, a single byte won't trigger an interrupt.
This allows a port to specify a custom qstrdefsport.h file, the same as the
QSTR_DEFS variable in a Makefile.
Signed-off-by: Damien George <damien@micropython.org>
The core cmake rules use custom commands to invoke qstr processing
scripts. For the zephyr port, it's possible that list arguments to these
commands may contain generator expressions, therefore we need to expand
them properly.
Signed-off-by: Maureen Helm <maureen.helm@nxp.com>
For certain operands to mpn_div, the existing code path for
`DIG_SIZE == MPZ_DBL_DIG_SIZE / 2` had a bug in it where borrow could still
overflow in the `(x >= *n || *n - x <= borrow)` branch, ie
`borrow + x - (mpz_dbl_dig_t)*n` overflows the borrow variable. In such
cases the subsequent right-shift of borrow would not bring in the overflow
bit, leading to an error in the result. An example division that had
overflow when MPZ_DIG_SIZE = 16 is `(2 ** 48 - 1) ** 2 // (2 ** 48 - 1)`.
This is fixed in this commit by simplifying the code and handling the low
digits of borrow first, and then the upper bits (to shift down) separately.
There is no longer a distinction between `DIG_SIZE < MPZ_DBL_DIG_SIZE / 2`
and `DIG_SIZE == MPZ_DBL_DIG_SIZE / 2`.
This commit also simplifies the second part of the calculation so that
borrow does not need to be negated (instead the code just works knowing
that borrow is negative and using + instead of - in calculations involving
borrow).
Fixes#6777.
Signed-off-by: Damien George <damien@micropython.org>
The "word" referred to by BYTES_PER_WORD is actually the size of mp_obj_t
which is not always the same as the size of a pointer on the target
architecture. So rename this config value to better reflect what it
measures, and also prefix it with MP_.
For uses of BYTES_PER_WORD in setting the stack limit this has been
changed to sizeof(void *), because the stack usually grows with
machine-word sized values (eg an nlr_buf_t has many machine words in it).
Signed-off-by: Damien George <damien@micropython.org>
It's only used in one location, to test if << or >> will overflow when
shifting mp_uint_t. For such a test it's clearer to use sizeof(lhs_val),
which will be valid even if the type of lhs_val changes.
Signed-off-by: Damien George <damien@micropython.org>
This environment variable, if defined during the build process,
indicates a fixed time that should be used in place of "now" when
such a time is explicitely referenced.
This allows for reproducible builds of micropython.
See https://reproducible-builds.org/specs/source-date-epoch/
Signed-off-by: iTitou <moiandme@gmail.com>
This should be enabled when the mp_raw_code_save_file function is needed.
It is enabled for mpy-cross, and a check for defined(__APPLE__) is added to
cover Mac M1 systems.
It practically does the same as qstr_from_str and was only used in one
place, which should actually use the compile-time MP_QSTR_XXX form for
consistency; qstr_from_str is for runtime strings only.
Adds a new compile-time option MICROPY_EMIT_THUMB_ARMV7M which is enabled
by default (to get existing behaviour) and which should be disabled (set to
0) when building native emitter support (@micropython.native) on ARMv6M
targets.
This returns a reference to the globals dict associated with the function,
ie the global scope that the function was defined in. This attribute is
read-only but the dict itself is modifiable, per CPython behaviour.
Signed-off-by: Damien George <damien@micropython.org>
The RP2040 is new microcontroller from Raspberry Pi that features
two Cortex M0s and eight PIO state machines that are good for
crunching lots of data. It has 264k RAM and a built in UF2
bootloader too.
Datasheet: https://pico.raspberrypi.org/files/rp2040_datasheet.pdf
Several issues have been found in the implementation. While they're
unresolved, it may be better to disable the built-in module. (This
means that to work on fixing the module, it'll be necessary to
revert this commit)
* Better messaging when code is stopped by an auto-reload.
* Auto-reload works during sleeps on ESP32-S2. Ticks wake up the
main task each time.
* Made internal naming consistent. CamelCase Python names are NOT
separated by an underscore.
As a general pattern, required positional arguments that are not named do
not need to be parsed using mp_arg_parse_all().
Signed-off-by: Damien George <damien@micropython.org>
This changes lots of files to unify `board.h` across ports. It adds
`board_deinit` when CIRCUITPY_ALARM is set. `main.c` uses it to
deinit the board before deep sleeping (even when pretending.)
Deep sleep is now a two step process for the port. First, the
port should prepare to deep sleep based on the given alarms. It
should set alarms for both deep and pretend sleep. In particular,
the pretend versions should be set immediately so that we don't
miss an alarm as we shutdown. These alarms should also wake from
`port_idle_until_interrupt` which is used when pretending to deep
sleep.
Second, when real deep sleeping, `alarm_enter_deep_sleep` is called.
The port should set any alarms it didn't during prepare based on
data it saved internally during prepare.
ESP32-S2 sleep is a bit reorganized to locate more logic with
TimeAlarm. This will help it scale to more alarm types.
Fixes#3786
Two issues are tackled:
1. The calculation of the correct length to print is fixed to treat the
precision as a maximum length instead as the exact length.
This is done for both qstr (%q) and for regular str (%s).
2. Fix the incorrect use of mp_printf("%.*s") to mp_print_strn().
Because of the fix of above issue, some testcases that would print
an embedded null-byte (^@ in test-output) would now fail.
The bug here is that "%s" was used to print null-bytes. Instead,
mp_print_strn is used to make sure all bytes are outputted and the
exact length is respected.
Test-cases are added for both %s and %q with a combination of precision
and padding specifiers.
This allows calls to `allocate_memory()` while the VM is running, it will then allocate from the GC heap (unless there is a suitable hole among the supervisor allocations), and when the VM exits and the GC heap is freed, the allocation will be moved to the bottom of the former GC heap and transformed into a proper supervisor allocation. Existing movable allocations will also be moved to defragment the supervisor heap and ensure that the next VM run gets as much memory as possible for the GC heap.
By itself this breaks terminalio because it violates the assumption that supervisor_display_move_memory() still has access to an undisturbed heap to copy the tilegrid from. It will work in many cases, but if you're unlucky you will get garbled terminal contents after exiting from the vm run that created the display. This will be fixed in the following commit, which is separate to simplify review.
`pow(a, b, c)` can compute `(a ** b) % c` efficiently (in time and memory).
This can be useful for extremely specific applications, like implementing
the RSA cryptosystem. For typical uses of CircuitPython, this is not an
important feature. A survey of the bundle and learn system didn't find
any uses.
Disable it on M0 builds so that we can fit in needed upgrades to the USB
stack.
Disable certain classes of diagnostic when building ulab. We should
submit patches upstream to (A) fix these errors and (B) upgrade their
CI so that the problems are caught before we want to integrate with
CircuitPython, but not right now.
Also known as L2CAP "connection oriented channels". This provides a
socket-like data transfer mechanism for BLE.
Currently only implemented for NimBLE on STM32 / Unix.
Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
I like to use local makefile overrides, in the file GNUmakefile
(or, on case-sensitive systems, makefile) to set compilation choices.
However, writing
TRANSLATION := de_DE
include Makefile
did not work, because py.mk would override the TRANSLATION := specified
in an earlier part of the makefiles (but not from the commandline).
By using ?= instead of := the local makefile override works, but when
TRANSLATION is not specified it continues to work as before.
This ensures that only the translate("") alternative that will be used
is seen after preprocessing. Improves the quality of the Huffman encoding
and reduces binary size slightly.
Also makes one "enhanced" error message only occur when ERROR_REPORTING_DETAILED:
Instead of the word-for-word python3 error message
"Type object has no attribute '%q'", the message will be
"'type' object has no attribute '%q'". Also reduces binary size.
(that's rolled into this commit as it was right next to a change to
use the preprocessor for MICROPY_ERROR_REPORTING)
Note that the odd semicolon after "value_error:" in parsenum.c is necessary
due to a detail of the C grammar, in which a declaration cannot follow
a label directly.
This reclaims over 1kB of flash space by simplifying certain exception
messages. e.g., it will no longer display the requested/actual length
when a fixed list/tuple of N items is needed:
if (MICROPY_ERROR_REPORTING == MICROPY_ERROR_REPORTING_TERSE) {
mp_raise_ValueError(translate("tuple/list has wrong length"));
} else {
mp_raise_ValueError_varg(translate("requested length %d but object has length %d"),
(int)len, (int)seq_len);
Other chip families including samd51 keep their current error reporting
capabilities.
* No weak link for modules. It only impacts _os and _time and is
already disabled for non-full builds.
* Turn off PA00 and PA01 because they are the crystal on the Metro
M0 Express.
* Change ejected default to false to move it to BSS. It is set on
USB connection anyway.
* Set sinc_filter to const. Doesn't help flash but keeps it out of
RAM.
This gives a substantial speedup of the preprocessing step, i.e. the
generation of qstr.i.last. For example on a clean build, making
qstr.i.last:
21s -> 4s on STM32 (WB55)
8.9 -> 1.8s on Unix (dev).
Done in collaboration with @stinos.
Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
Support C++ code in .cpp files by providing CXX counterparts of the
_USERMOD_ flags we have for C already. This merely enables the Makefile of
user C modules to use variables specific to C++ compilation, it is still up
to each port's main Makefile to also include these in the build.
When SCR_QSTR contains C++ files they should be preprocessed with the same
compiler flags (CXXFLAGS) as they will be compiled with, to make sure code
scanned for QSTR occurrences is effectively the code used in the rest of
the build. The 'split SCR_QSTR in .c and .cpp files and process each with
different flags' logic isn't trivial to express in a Makefile and the
existing principle for deciding which files to preprocess was already
rather complicated, so the actual preprocessing is moved into
makeqstrdefs.py completely.
When process_file() is passed a preprocessed C++ file for instance it won't
find any lines containing .c files and the last_fname variable remains
None, so handle that gracefully.
Newer GCC versions are able to warn about switch cases that fall
through. This is usually a sign of a forgotten break statement, but in
the few cases where a fall through is intended we annotate it with this
macro to avoid the warning.
Like Clang, GCC warns about this file, but only with -Woverride-init
which is enabled by -Wextra. Disable the warnings for this file just
like we do for Clang to make -Wextra happy.
When compiling with -Wextra which includes -Wmissing-field-initializers
GCC will warn that the defval field of mp_arg_val_t is not initialized.
This is just a warning as it is defined to be zero initialized, but since
it is a union it makes sense to be explicit about which member we're
going to use, so add the explicit initializers and get rid of the
warning.
On x86 chars are signed, but we're comparing a char to '0' + unsigned int,
which is promoted to an unsigned int. Let's promote the char to unsigned
before doing the comparison to avoid weird corner cases.
The function scope_find_or_add_id used to take a scope_kind_t enum and
save it in an uint8_t. Saving an enum in a uint8_t is fine, but
everywhere this function is called it is not actually given a
scope_kind_t but an anonymous enum instead. Let's give this enum a name
and use that as the argument type.
This doesn't change the generated code, but is a C type mismatch that
unfortunately doesn't show up unless you enable -Wenum-conversion.
This gets a further speedup of about 2s (12s -> 9.5s elapsed build time)
for stm32f405_feather
For what are probably historical reasons, the qstr process involves
preprocessing a large number of source files into a single "qstr.i.last"
file, then reading this and splitting it into one "qstr" file for each
original source ("*.c") file.
By eliminating the step of writing qstr.i.last as well as making the
regular-expression-matching part be parallelized, build speed is further
improved.
Because the step to build QSTR_DEFS_COLLECTED does not access
qstr.i.last, the path is replaced with "-" in the Makefile.
Rather than simply invoking gcc in preprocessor mode with a list of files, use
a Python script with the (python3) ThreadPoolExecutor to invoke the
preprocessor in parallel.
The amount of concurrency is the number of system CPUs, not the makefile "-j"
parallelism setting, because there is no simple and correct way for a Python
program to correctly work together with make's idea of parallelism.
This reduces the build time of stm32f405 feather (a non-LTO build) from 16s to
12s on my 16-thread Ryzen machine.
Some examples of improved compliance with CPython that currently
have divergent behavior in CircuitPython are listed below:
* yield from is not allowed in async methods
```
>>> async def f():
... yield from 'abc'
...
Traceback (most recent call last):
File "<stdin>", line 2, in f
SyntaxError: 'yield from' inside async function
```
* await only works on awaitable expressions
```
>>> async def f():
... await 'not awaitable'
...
>>> f().send(None)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in f
AttributeError: 'str' object has no attribute '__await__'
```
* only __await__()able expressions are awaitable
Okay this one actually does not work in circuitpython at all today.
This is how CPython works though and pretending __await__ does not
exist will only bite users who write both.
```
>>> class c:
... pass
...
>>> def f(self):
... yield
... yield
... return 'f to pay respects'
...
>>> c.__await__ = f # could just as easily have put it on the class but this shows how it's wired
>>> async def g():
... awaitable_thing = c()
... partial = await awaitable_thing
... return 'press ' + partial
...
>>> q = g()
>>> q.send(None)
>>> q.send(None)
>>> q.send(None)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration: press f to pay respects
```
This adds the `async def` and `await` verbs to valid CircuitPython syntax using the Micropython implementation.
Consider:
```
>>> class Awaitable:
... def __iter__(self):
... for i in range(3):
... print('awaiting', i)
... yield
... return 42
...
>>> async def wait_for_it():
... a = Awaitable()
... result = await a
... return result
...
>>> task = wait_for_it()
>>> next(task)
awaiting 0
>>> next(task)
awaiting 1
>>> next(task)
awaiting 2
>>> next(task)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration: 42
>>>
```
and more excitingly:
```
>>> async def it_awaits_a_subtask():
... value = await wait_for_it()
... print('twice as good', value * 2)
...
>>> task = it_awaits_a_subtask()
>>> next(task)
awaiting 0
>>> next(task)
awaiting 1
>>> next(task)
awaiting 2
>>> next(task)
twice as good 84
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration:
```
Note that this is just syntax plumbing, not an all-encompassing implementation of an asynchronous task scheduler or asynchronous hardware apis.
uasyncio might be a good module to bring in, or something else - but the standard Python syntax does not _strictly require_ deeper hardware
support.
Micropython implements the await verb via the __iter__ function rather than __await__. It's okay.
The syntax being present will enable users to write clean and expressive multi-step state machines that are written serially and interleaved
according to the rules provided by those users.
Given that this does not include an all-encompassing C scheduler, this is expected to be an advanced functionality until the community settles
on the future of deep hardware support for async/await in CircuitPython. Users will implement yield-based schedulers and tasks wrapping
synchronous hardware APIs with polling to avoid blocking, while their application business logic gets simple `await` statements.
Some downstream projects may use tags in their repositories for more than
just designating MicroPython releases. In those cases, the
makeversionhdr.py script would end up using a different tag than intended.
So tell `git describe` to only match tags that look like a MicroPython
version tag, such as `v1.12` or `v2.0`.
This already begins obscuring things, because now there are two sets of
shared-module functions for manipulating the same structure, e.g.,
common_hal_canio_remote_transmission_request_get_id and
common_hal_canio_message_get_id
Calling the bytes constructor on a bytes object returns the original bytes
object. This saves allocating a new instance, and matches CPython.
Signed-off-by: Iyassou Shimels <s.iyassou@gmail.com>
New contributor @mdroberts1243 encountered an interesting problem in
which the argument they had named "column_underscore_and_page_addressing"
simply couldn't be used; I discovered that internally this had been
transformed into "column_underscore∧page_addressing", because QSTR
makes _ENTITY_ stand for the same thing as &ENTITY; does in HTML.
This might be nice for some things, but we don't want it here!
I was unable to find a sensible way to "escape" and prevent this entity
coding, so instead I ripped out support for the _and_ and _or_ escapes.
Tested & working:
* Send standard packets
* Receive standard packets (1 FIFO, no filter)
Interoperation between SAM E54 Xplained running this tree and
MicroPython running on STM32F405 Feather with an external
transceiver was also tested.
Many other aspects of a full implementation are not yet present,
such as error detection and recovery.
Discord user Folknology encountered a problem building with Python 3.6.9,
`TypeError: ord() expected a character, but string of length 0 found`.
I was able to reproduce the problem using Python3.5*, and discovered that
the meaning of the regular expression `"|."` had changed in 3.7. Before,
```
>>> [m.group(0) for m in re.finditer("|.", "hello")]
['', '', '', '', '', '']
```
After:
```
>>> [m.group(0) for m in re.finditer("|.", "hello")]
['', 'h', '', 'e', '', 'l', '', 'l', '', 'o', '']
```
Check if `words` is empty and if so use `"."` as the regular expression
instead. This gives the same result on both versions:
```
['h', 'e', 'l', 'l', 'o']
```
and fixes the generation of the huffman dictionary.
Folknology verified that this fix worked for them.
* I could easily install 3.5 but not 3.6. 3.5 reproduced the same problem
This construct (which I added without sufficient testing,
apparently) is only supported in Python 3.7 and newer. Make it
optional so that this script works on other Python versions. This
means that if you have a system with non-UTF-8 encoding you will
need to use Python 3.7.
In particular, this affects a problem building circuitpython in
github's ubuntu-18.04 virtual environment when Python 3.7 is not
explicitly installed. cookie-cuttered libraries call for Python
3.6:
```
- name: Set up Python 3.6
uses: actions/setup-python@v1
with:
python-version: 3.6
```
Since CircuitPython's own build calls for 3.8, this problem was not
detected.
This problem was also encountered by discord user mdroberts1243.
The failure I encountered was here:
https://github.com/jepler/Jepler_CircuitPython_udecimal/runs/1138045020?check_suite_focus=true
.. while my step of "clone and build circuitpython unix port" is
unusual, I think the same problem would have affected "build assets"
if that step had been reached.
For time-based functions that work with absolute time there is the need for
an Epoch, to set the zero-point at which the absolute time starts counting.
Such functions include time.time() and filesystem stat return values. And
different ports may use a different Epoch.
To make it clearer what functions use the Epoch (whatever it may be), and
make the ports more consistent with their use of the Epoch, this commit
renames all Epoch related functions to include the word "epoch" in their
name (and remove references to "2000").
Along with this rename, the following things have changed:
- mp_hal_time_ns() is now specified to return the number of nanoseconds
since the Epoch, rather than since 1970 (but since this is an internal
function it doesn't change anything for the user).
- littlefs timestamps on the esp8266 have been fixed (they were previously
off by 30 years in nanoseconds).
Otherwise, there is no functional change made by this commit.
Signed-off-by: Damien George <damien@micropython.org>
Most users and the CI system are running in configurations where Python
configures stdout and stderr in UTF-8 mode. However, Windows is different,
setting values like CP1252. This led to a build failure on Windows, because
makeqstrdata printed Unicode strings to its stdout, expecting them to be
encoded as UTF-8.
This script is writing (stdout) to a compiler input file and potentially
printing messages (stderr) to a log or console. Explicitly configure stdout to
use utf-8 to get consistent behavior on all platforms, and configure stderr so
that if any log/diagnostic messages are printed that cannot be displayed
correctly, they are still displayed instead of creating an error while trying
to print the diagnostic information.
I considered setting the encodings both to ascii, but this would just be
occasionally inconvenient to developers like me who want to show diagnostic
info on stderr and in comments while working with the compression code.
Closes: #3408
While checking whether we can enable -Wimplicit-fallthrough, I encountered
a diagnostic in mp_binary_set_val_array_from_int which led to discovering
the following bug:
```
>>> struct.pack("xb", 3)
b'\x03\x03'
```
That is, the next value (3) was used as the value of a padding byte, while
standard Python always fills "x" bytes with zeros. I initially thought
this had to do with the unintentional fallthrough, but it doesn't.
Instead, this code would relate to an array.array with a typecode of
padding ('x'), which is ALSO not desktop Python compliant:
```
>>> array.array('x', (1, 2, 3))
array('x', [1, 0, 0])
```
Possibly this is dead code that used to be shared between struct-setting
and array-setting, but it no longer is.
I also discovered that the argument list length for struct.pack
and struct.pack_into were not checked, and that the length of binary data
passed to array.array was not checked to be a multiple of the element
size.
I have corrected all of these to conform more closely to standard Python
and revised some tests where necessary. Some tests for micropython-specific
behavior that does not conform to standard Python and is not present
in CircuitPython was deleted outright.
Massive savings. Thanks so much @ciscorn for providing the initial
code for choosing the dictionary.
This adds a bit of time to the build, both to find the dictionary
but also because (for reasons I don't fully understand), the binary
search in the compress() function no longer worked and had to be
replaced with a linear search.
I think this is because the intended invariant is that for codebook
entries that encode to the same number of bits, the entries are ordered
in ascending value. However, I mis-placed the transition from "words"
to "byte/char values" so the codebook entries for words are in word-order
rather than their code order.
Because this price is only paid at build time, I didn't care to determine
exactly where the correct fix was.
I also commented out a line to produce the "estimated total memory size"
-- at least on the unix build with TRANSLATION=ja, this led to a build
time KeyError trying to compute the codebook size for all the strings.
I think this occurs because some single unicode code point ('ァ') is
no longer present as itself in the compressed strings, due to always
being replaced by a word.
As promised, this seems to save hundreds of bytes in the German translation
on the trinket m0.
Testing performed:
- built trinket_m0 in several languages
- built and ran unix port in several languages (en, de_DE, ja) and ran
simple error-producing codes like ./micropython -c '1/0'
Prior to this commit, pow(-2, float('nan')) would return (nan+nanj), or
raise an exception on targets that don't support complex numbers. This is
fixed to return simply nan, as CPython does.
Signed-off-by: Damien George <damien@micropython.org>
This is consistent with the other 'micro' modules and allows implementing
additional features in Python via e.g. micropython-lib's sys.
Note this is a breaking change (not backwards compatible) for ports which
do not enable weak links, as "import sys" must now be replaced with
"import usys".
Compress common unicode bigrams by making code points in the range
0x80 - 0xbf (inclusive) represent them. Then, they can be greedily
encoded and the substituted code points handled by the existing Huffman
compression. Normally code points in the range 0x80-0xbf are not used
in Unicode, so we stake our own claim. Using the more arguably correct
"Private Use Area" (PUA) would mean that for scripts that only use
code points under 256 we would use more memory for the "values" table.
bigram means "two letters", and is also sometimes called a "digram".
It's nothing to do with "big RAM". For our purposes, a bigram represents
two successive unicode code points, so for instance in our build on
trinket m0 for english the most frequent are:
['t ', 'e ', 'in', 'd ', ...].
The bigrams are selected based on frequency in the corpus, but the
selection is not necessarily optimal, for these reasons I can think of:
* Suppose the corpus was just "tea" repeated 100 times. The
top bigrams would be "te", and "ea". However,
overlap, "te" could never be used. Thus, some bigrams might actually
waste space
* I _assume_ this has to be why e.g., bigram 0x86 "s " is more
frequent than bigram 0x85 " a" in English for Trinket M0, because
sequences like "can't add" would get the "t " digram and then
be unable to use the " a" digram.
* And generally, if a bigram is frequent then so are its constituents.
Say that "i" and "n" both encode to just 5 or 6 bits, then the huffman
code for "in" had better compress to 10 or fewer bits or it's a net
loss!
* I checked though! "i" is 5 bits, "n" is 6 bits (lucky guess)
but the bigram 0x83 also just 6 bits, so this one is a win of
5 bits for every "it" minus overhead. Yay, this round goes to team
compression.
* On the other hand, the least frequent bigram 0x9d " n" is 10 bits
long and its constituent code points are 4+6 bits so there's no
savings, but there is the cost of the table entry.
* and somehow 0x9f 'an' is never used at all!
With or without accounting for overlaps, there is some optimum number
of bigrams. Adding one more bigram uses at least 2 bytes (for the
entry in the bigram table; 4 bytes if code points >255 are in the
source text) and also needs a slot in the Huffman dictionary, so
adding bigrams beyond the optimim number makes compression worse again.
If it's an improvement, the fact that it's not guaranteed optimal
doesn't seem to matter too much. It just leaves a little more fruit
for the next sweep to pick up. Perhaps try adding the most frequent
bigram not yet present, until it doesn't improve compression overall.
Right now, de_DE is again the "fullest" build on trinket_m0. (It's
reclaimed that spot from the ja translation somehow) This change saves
104 bytes there, increasing free space about 6.8%. In the larger
(but not critically full) pyportal build it saves 324 bytes.
The specific number of bigrams used (32) was chosen as it is the max
number that fit within the 0x80..0xbf range. Larger tables would
require the use of 16 bit code points in the de_DE build, losing savings
overall.
(Side note: The most frequent letters in English have been said
to be: ETA OIN SHRDLU; but we have UAC EIL MOPRST in our corpus)
A crash like the following occurs in the unix port:
```
Program received signal SIGSEGV, Segmentation fault.
0x00005555555a2d7a in mp_obj_module_set_globals (self_in=0x55555562c860 <ulab_user_cmodule>, globals=0x55555562c840 <mp_module_ulab_globals>) at ../../py/objmodule.c:145
145 self->globals = globals;
(gdb) up
#1 0x00005555555b2781 in mp_builtin___import__ (n_args=5, args=0x7fffffffdbb0) at ../../py/builtinimport.c:496
496 mp_obj_module_set_globals(outer_module_obj,
(gdb)
#2 0x00005555555940c9 in mp_import_name (name=824, fromlist=0x555555621f10 <mp_const_none_obj>, level=0x1) at ../../py/runtime.c:1392
1392 return mp_builtin___import__(5, args);
```
I don't understand how it doesn't happen on the embedded ports, because
the module object should reside in ROM and the assignment of self->globals
should trigger a Hard Fault.
By checking VERIFY_PTR, we know that the pointed-to data is on the heap
so we can do things like mutate it.
Updating to Black v20.8b1 there are two changes that affect the code in
this repository:
- If there is a trailing comma in a list (eg [], () or function call) then
that list is now written out with one line per element. So remove such
trailing commas where the list should stay on one line.
- Spaces at the start of """ doc strings are removed.
Signed-off-by: Damien George <damien@micropython.org>
In #2689, hitting ctrl-c during the printing of an object with a lot of sub-objects could cause the screen to stop updating (without showing a KeyboardInterrupt). This makes the printing of such objects acutally interruptable, and also correctly handles the KeyboardInterrupt:
```
>>> l = ["a" * 100] * 200
>>> l
['aaaaaaaaaaaaaaaaaaaaaa...aaaaaaaaaaa', Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyboardInterrupt:
>>>
```
As per CPython behaviour, compile(stmt, "file", "single") should create
code which prints to stdout (via __repl_print__) the results of any
expressions in stmt.
Signed-off-by: Damien George <damien@micropython.org>
The font is missing many characters and the build needs the space.
We can optimize font storage when we get a good font.
The serial output will work as usual.
This check as implemented is misleading, because it compares the
compressed size in bytes (including the length indication) with the source
string length in Unicode code points. For English this is approximately
fair, but for Japanese this is quite unfair and produces an excess of
"increased length" messages.
This message might have existed for one of two reasons:
* to alert to an improperly function huffman compression
* to call attention to a need for a "string is stored uncompressed" case
We know by now that the huffman compression is functioning as designed and
effective in general.
Just to be on the safe side, I did some back-of-the-envelope estimates.
I considered these three replacements for "the true source string size, in bytes":
+ decompressed_len_utf8 = len(decompressed.encode('utf-8'))
+ decompressed_len_utf16 = len(decompressed.encode('utf-16be'))
+ decompressed_len_bitsize = ((1+len(decompressed)) * math.ceil(math.log(1+len(values), 2)) + 7) // 8
The third counts how many bits each character requires (fewer than 128
characters in the source character set = 7, fewer than 256 = 8, fewer than 512
= 9, etc, adding a string-terminating value) and is in some way representative
of the best way we would be able to store "uncompressed strings". The Japanese
translation (largest as of writing) has just a few strings which increase by
this metric. However, the amount of loss due to expansion in those cases is
outweighed by the cost of adding 1 bit per string to indicate whether it's
compressed or not. For instance, in the BOARD=trinket_m0 TRANSLATION=ja build
the loss is 47 bytes over 300 strings. Adding 1 bit to each of 300 strings will
cost about 37 bytes, leaving just 5 Thumb instructions to implement the code to
check and decode "uncompressed" strings in order to break even.
This is a slight trade-off with code size, in places where a "_varg"
mp_raise variant is now used. The net savings on trinket_m0 is
just 32 bytes.
It also means that the translation will include the original English
text, and cannot be translated. These are usually names of Python
types such as int, set, or dict or special values such as "inf" or
"Nan".
On ports where normal heap memory can contain executable code (eg ARM-based
ports such as stm32), native code loaded from an .mpy file may be reclaimed
by the GC because there's no reference to the very start of the native
machine code block that is reachable from root pointers (only pointers to
internal parts of the machine code block are reachable, but that doesn't
help the GC find the memory).
This commit fixes this issue by maintaining an explicit list of root
pointers pointing to native code that is loaded from an .mpy file. This
is not needed for all ports so is selectable by the new configuration
option MICROPY_PERSISTENT_CODE_TRACK_RELOC_CODE. It's enabled by default
if a port does not specify any special functions to allocate or commit
executable memory.
A test is included to test that native code loaded from an .mpy file does
not get reclaimed by the GC.
Fixes#6045.
Signed-off-by: Damien George <damien@micropython.org>
This version
* moves source files to reflect module structure
* adds inline documentation suitable for extract_pyi
* incompatibly moves spectrogram to fft
* incompatibly removes "extras"
There are some remaining markup errors in the specific revision of
extmod/ulab but they do not prevent the doc building process from
completing.
MicroPython's original implementation of __aiter__ was correct for an
earlier (provisional) version of PEP492 (CPython 3.5), where __aiter__ was
an async-def function. But that changed in the final version of PEP492 (in
CPython 3.5.2) where the function was changed to a normal one. See
https://www.python.org/dev/peps/pep-0492/#why-aiter-does-not-return-an-awaitable
See also the note at the end of this subsection in the docs:
https://docs.python.org/3.5/reference/datamodel.html#asynchronous-iterators
And for completeness the BPO: https://bugs.python.org/issue27243
To be consistent with the Python spec as it stands today (and now that
PEP492 is final) this commit changes MicroPython's behaviour to match
CPython: __aiter__ should return an async-iterable object, but is not
itself awaitable.
The relevant tests are updated to match.
See #6267.
MicroPython's original implementation of __aiter__ was correct for an
earlier (provisional) version of PEP492 (CPython 3.5), where __aiter__ was
an async-def function. But that changed in the final version of PEP492 (in
CPython 3.5.2) where the function was changed to a normal one. See
https://www.python.org/dev/peps/pep-0492/#why-aiter-does-not-return-an-awaitable
See also the note at the end of this subsection in the docs:
https://docs.python.org/3.5/reference/datamodel.html#asynchronous-iterators
And for completeness the BPO: https://bugs.python.org/issue27243
To be consistent with the Python spec as it stands today (and now that
PEP492 is final) this commit changes MicroPython's behaviour to match
CPython: __aiter__ should return an async-iterable object, but is not
itself awaitable.
The relevant tests are updated to match.
See #6267.
coroutines don't have __next__; they also call themselves coroutines.
This does not change the fact that `async def` methods are generators,
but it does make them behave more like CPython.
Otherwise functions like memset might get optimised to call themselves (eg
with gcc 10). And provide CFLAGS_BUILTIN so these options can be changed
by a port if needed.
Fixes issue #6053.
Because the argument arrays may overlap, as show by the new tests in this
commit.
Also remove the debugging comments for these macros, add a new comment
about overlapping regions, and separate the macros by blank lines to make
them easier to read.
Fixes issue #6244.
Signed-off-by: Damien George <damien@micropython.org>
This commit fixes lookups of class members to make it so that built-in
functions that are used as methods/functions of a class work correctly.
The mp_convert_member_lookup() function is pretty much completely changed
by this commit, but for the most part it's just reorganised and the
indenting changed. The functional changes are:
- staticmethod and classmethod checks moved to later in the if-logic,
because they are less common and so should be checked after the more
common cases.
- The explicit mp_obj_is_type(member, &mp_type_type) check is removed
because it's now subsumed by other, more general tests in this function.
- MP_TYPE_FLAG_BINDS_SELF and MP_TYPE_FLAG_BUILTIN_FUN type flags added to
make the checks in this function much simpler (now they just test this
bit in type->flags).
- An extra check is made for mp_obj_is_instance_type(type) to fix lookup of
built-in functions.
Fixes#1326 and #6198.
Signed-off-by: Damien George <damien@micropython.org>
Testing performed: That a card is successfully mounted on Pygamer with
the built in SD card slot
This module is enabled for most FULL_BUILD boards, but is disabled for
samd21 ("M0"), litex, and pca10100 for various reasons.
This allows complex binary operations to fail gracefully with unsupported
operation rather than raising an exception, so that special methods work
correctly.
Signed-off-by: Damien George <damien@micropython.org>
uint types in viper mode can now be used for all binary operators except
floor-divide and modulo.
Fixes issue #1847 and issue #6177.
Signed-off-by: Damien George <damien@micropython.org>
An OrderedDict can now be used for the locals when creating a type
explicitly via type(name, bases, locals).
Signed-off-by: Damien George <damien@micropython.org>
I noticed that this code was referring to samd-specific functionality,
and isn't enabled except in one samd board (pewpew10). Move it.
There is incomplte support for _pew in mimxrt10xx which then caused build
errors; adding a #if guard to check for _pew being enabled fixes it.
The _pew module is not likely to be important on mimxrt but I'll leave the
choice to remove it to someone else.
This addition to the grammar was introduced in Python 3.6. It allows
annotating the type of a varilable, like:
x: int = 123
s: str
The implementation in this commit is quite simple and just ignores the
annotation (the int and str bits above). The reason to implement this is
to allow Python 3.6+ code that uses this feature to compile under
MicroPython without change, and for users to use type checkers.
In the future viper could use this syntax as a way to give types to
variables, which is currently done in a bit of an ad-hoc way, eg
x = int(123). And this syntax could potentially be used in the inline
assembler to define labels in an way that's easier to read.
The syntax matches CPython and the semantics are equivalent except that,
unlike CPython, MicroPython allows using := to assign to comprehension
iteration variables, because disallowing this would take a lot of code to
check for it.
The new compile-time option MICROPY_PY_ASSIGN_EXPR selects this feature and
is enabled by default, following MICROPY_PY_ASYNC_AWAIT.
Formatting for `* sizeof` was fixed in uncrustify v0.71, so we no longer
need the fixups for it. Also, there was one file where the updated
uncrustify caught a problem that the regex didn't pick up, which is updated
in this commit.
Signed-off-by: David Lechner <david@pybricks.com>
Long ago, prior to 0ef01d0a75, fixed and
ordered maps were the same setting with the "table_is_fixed_array" member
of mp_map_t. But these settings are actually independent, and it is
possible to have is_fixed=1, is_ordered=0 (although this can currently
only be done by tools/cc1). So update the comments to reflect this.
The resulting dict is now marked as read-only (is_fixed=1) to enforce the
fact that changes to this dict will not be reflected in the class instance.
This commit reduces code size by about 20 bytes, and should be more
efficient because it creates a direct copy of the dict rather than
reinserting all elements.
The behavior mirrors the instance object dict attribute where a copy of the
local attributes are provided (unless the dict is read-only, then that dict
itself is returned, as an optimisation). MicroPython does not support
modifying this dict because the changes will not be reflected in the class.
The feature is only enabled if MICROPY_CPYTHON_COMPAT is set, the same as
the instance version.
There doesn't appear to be any use for only triggering on specific events,
so it's just easier to number them sequentially. This makes them smaller
values so they take up only 1 byte in the ringbuf, only 1 byte for the
opcode in the bytecode, and makes room for more events.
Also add a couple of new event types that need to be implemented (to avoid
re-numbering later).
And rename _COMPLETE and _STATUS to _DONE for consistency.
In the future the "trigger" keyword argument can be reinstated by requiring
the user to compute the bitmask, eg:
ble.irq(handler, 1 << _IRQ_SCAN_RESULT | 1 << _IRQ_SCAN_DONE)
* Fix flash writes that don't end on a sector boundary. Fixes#2944
* Fix enum incompatibility with IDF.
* Fix printf output so it goes out debug UART.
* Increase stack size to 8k.
* Fix sleep of less than a tick so it doesn't crash.
Length was stored as a 16-bit number always. Most translations have
a max length far less. For example, US English translation lengths
always fit in just 8 bits. probably all languages fit in 9 bits.
This also has the side effect of reducing the alignment of
compressed_string_t from 2 bytes to 1.
testing performed: ran in german and english on pyruler, printed messages
looked right.
Firmware size, en_US
Before: 3044 bytes free in flash
After: 3408 bytes free in flash
Firmware size, de_DE (with #2967 merged to restore translations)
Before: 1236 bytes free in flash
After: 1600 bytes free in flash