This patch adds a function utf8_check() to check for a valid UTF-8 encoded
string, and calls it when constructing a str from raw bytes. The feature
is selectable at compile time via MICROPY_PY_BUILTINS_STR_UNICODE_CHECK and
is enabled if unicode is enabled. It costs about 110 bytes on Thumb-2, 150
bytes on Xtensa and 170 bytes on x86-64.
The unary-op/binary-op enums are already defined, and there are no
arithmetic tricks used with these types, so it makes sense to use the
correct enum type for arguments that take these values. It also reduces
code size quite a bit for nan-boxing builds.
Otherwise, it will silently get incorrect result on other values types,
including CPython tuple form like "foo.png".endswith(("png", "jpg"))
(which MicroPython doesn't support for unbloatedness).
- Changed: ValueError, TypeError, NotImplementedError
- OSError invocations unchanged, because the corresponding utility
function takes ints, not strings like the long form invocation.
- OverflowError, IndexError and RuntimeError etc. not changed for now
until we decide whether to add new utility functions.
Fixes for stmhal USB mass storage, lwIP bindings and VFS regressions
This release provides an important fix for the USB mass storage device in
the stmhal port by implementing the SCSI SYNCHRONIZE_CACHE command, which
is now require by some Operating Systems. There are also fixes for the
lwIP bindings to improve non-blocking sockets and error codes. The VFS has
some regressions fixed including the ability to statvfs the root.
All changes are listed below.
py core:
- modbuiltins: add core-provided version of input() function
- objstr: catch case of negative "maxsplit" arg to str.rsplit()
- persistentcode: allow to compile with complex numbers disabled
- objstr: allow to compile with obj-repr D, and unicode disabled
- modsys: allow to compile with obj-repr D and PY_ATTRTUPLE disabled
- provide mp_decode_uint_skip() to help reduce stack usage
- makeqstrdefs.py: make script run correctly with Python 2.6
- objstringio: if created from immutable object, follow copy on write policy
extmod:
- modlwip: connect: for non-blocking mode, return EINPROGRESS
- modlwip: fix error codes for duplicate calls to connect()
- modlwip: accept: fix error code for non-blocking mode
- vfs: allow to statvfs the root directory
- vfs: allow "buffering" and "encoding" args to VFS's open()
- modframebuf: fix signed/unsigned comparison pendantic warning
lib:
- libm: use isfinite instead of finitef, for C99 compatibility
- utils/interrupt_char: remove support for KBD_EXCEPTION disabled
tests:
- basics/string_rsplit: add tests for negative "maxsplit" argument
- float: convert "sys.exit()" to "raise SystemExit"
- float/builtin_float_minmax: PEP8 fixes
- basics: convert "sys.exit()" to "raise SystemExit"
- convert remaining "sys.exit()" to "raise SystemExit"
unix port:
- convert to use core-provided version of built-in import()
- Makefile: replace references to make with $(MAKE)
windows port:
- convert to use core-provided version of built-in import()
qemu-arm port:
- Makefile: adjust object-file lists to get correct dependencies
- enable micropython.mem_*() functions to allow more tests
stmhal port:
- boards: enable DAC for NUCLEO_F767ZI board
- add support for NUCLEO_F446RE board
- pass USB handler as parameter to allow more than one USB handler
- usb: use local USB handler variable in Start-of-Frame handler
- usb: make state for USB device private to top-level USB driver
- usbdev: for MSC implement SCSI SYNCHRONIZE_CACHE command
- convert from using stmhal's input() to core provided version
cc3200 port:
- convert from using stmhal's input() to core provided version
teensy port:
- convert from using stmhal's input() to core provided version
esp8266 port:
- Makefile: replace references to make with $(MAKE)
- Makefile: add clean-modules target
- convert from using stmhal's input() to core provided version
zephyr port:
- modusocket: getaddrinfo: Fix mp_obj_len() usage
- define MICROPY_PY_SYS_PLATFORM (to "zephyr")
- machine_pin: use native Zephyr types for Zephyr API calls
docs:
- machine.Pin: remove out_value() method
- machine.Pin: add on() and off() methods
- esp8266: consistently replace Pin.high/low methods with .on/off
- esp8266/quickref: polish Pin.on()/off() examples
- network: move confusingly-named cc3200 Server class to its reference
- uos: deconditionalize, remove minor port-specific details
- uos: move cc3200 port legacy VFS mounting functions to its ref doc
- machine: sort machine classes in logical order, not alphabetically
- network: first step to describe standard network class interface
examples:
- embedding: use core-provided KeyboardInterrupt object
Split this setting from MICROPY_CPYTHON_COMPAT. The idea is to be able to
keep MICROPY_CPYTHON_COMPAT disabled, but still pass more of regression
testsuite. In particular, this fixes last failing test in basics/ for
Zephyr port.
This patch changes mp_uint_t to size_t for the len argument of the
following public facing C functions:
mp_obj_tuple_get
mp_obj_list_get
mp_obj_get_array
These functions take a pointer to the len argument (to be filled in by the
function) and callers of these functions should update their code so the
type of len is changed to size_t. For ports that don't use nan-boxing
there should be no change in generate code because the size of the type
remains the same (word sized), and in a lot of cases there won't even be a
compiler warning if the type remains as mp_uint_t.
The reason for this change is to standardise on the use of size_t for
variables that count memory (or memory related) sizes/lengths. It helps
builds that use nan-boxing.
Instead of always reporting some object cannot be implicitly be converted
to a 'str', even when it is a 'bytes' object, adjust the logic so that
when trying to convert str to bytes it is shown like that.
This will still report bad implicit conversion from e.g. 'int to bytes'
as 'int to str' but it will not result in the confusing
'can't convert 'str' object to str implicitly' anymore for calls like
b'somestring'.count('a').
Allows to iterate over the following without allocating on the heap:
- tuple
- list
- string, bytes
- bytearray, array
- dict (not dict.keys, dict.values, dict.items)
- set, frozenset
Allows to call the following without heap memory:
- all, any, min, max, sum
TODO: still need to allocate stack memory in bytecode for iter_buf.
This patch fixes two main things:
- dicts can be printed directly using '%s' % dict
- %-formatting should not crash when passed a non-dict to, eg, '%(foo)s'
In this, don't allocate copy, just return non-empty string. This helps
with a standard pattern of buffering data in case of short reads:
buf = b""
while ...:
s = f.read(...)
buf += s
...
For a typical case when single read returns all data needed, there won't
be extra allocation. This optimization helps uasyncio.
Checks for number of args removes where guaranteed by function descriptor,
self checking is replaced with mp_check_self(). In few cases, exception
is raised instead of assert.
Introduce mp_raise_msg(), mp_raise_ValueError(), mp_raise_TypeError()
instead of previous pattern nlr_raise(mp_obj_new_exception_msg(...)).
Save few bytes on each call, which are many.
Disabled by default, enabled in unix port. Need for this method easily
pops up when working with text UI/reporting, and coding workalike
manually again and again counter-productive.
The type is an unsigned 8-bit value, since bytes objects are exactly
that. And it's also sensible for unicode strings to return unsigned
values when accessed in a byte-wise manner (CPython does not allow this).
Eg: '{:{}}'.format(123, '>20')
@pohmelie was the original author of this patch, but @dpgeorge made
significant changes to reduce code size and improve efficiency.
The first argument to the type.make_new method is naturally a uPy type,
and all uses of this argument cast it directly to a pointer to a type
structure. So it makes sense to just have it a pointer to a type from
the very beginning (and a const pointer at that). This patch makes
such a change, and removes all unnecessary casting to/from mp_obj_t.
This patch changes the type signature of .make_new and .call object method
slots to use size_t for n_args and n_kw (was mp_uint_t. Makes code more
efficient when mp_uint_t is larger than a machine word. Doesn't affect
ports when size_t and mp_uint_t have the same size.
Only types whose iterator instances still fit in 4 machine words have
been changed to use the polymorphic iterator.
Reduces Thumb2 arch code size by 264 bytes.
This allows the mp_obj_t type to be configured to something other than a
pointer-sized primitive type.
This patch also includes additional changes to allow the code to compile
when sizeof(mp_uint_t) != sizeof(void*), such as using size_t instead of
mp_uint_t, and various casts.
This saves around 1000 bytes (Thumb2 arch) because in repr "C" it is
costly to check and extract a qstr. So making such check/extract a
function instead of a macro saves lots of code space.
I checked the entire codebase, and every place that vstr_init_len
was called, there was a call to mp_obj_new_str_from_vstr after it.
mp_obj_new_str_from_vstr always tries to reallocate a new buffer
1 byte larger than the original to store the terminating null
character.
In many cases, if we allocated the initial buffer to be 1 byte
longer, we can prevent this extra allocation, and just reuse
the originally allocated buffer.
Asking to read 256 bytes and only getting 100 will still cause
the extra allocation, but if you ask to read 256 and get 256
then the extra allocation will be optimized away.
Yes - the reallocation is optimized in the heap to try and reuse
the buffer if it can, but it takes quite a few cycles to figure
this out.
Note by Damien: vstr_init_len should now be considered as a
string-init convenience function and used only when creating
null-terminated objects.
C's printf will pad nan/inf differently to CPython. Our implementation
originally conformed to C, now it conforms to CPython's way.
Tests for this are also added in this patch.
Hashing is now done using mp_unary_op function with MP_UNARY_OP_HASH as
the operator argument. Hashing for int, str and bytes still go via
fast-path in mp_unary_op since they are the most common objects which
need to be hashed.
This lead to quite a bit of code cleanup, and should be more efficient
if anything. It saves 176 bytes code space on Thumb2, and 360 bytes on
x86.
The only loss is that the error message "unhashable type" is now the
more generic "unsupported type for __hash__".
Previous to this patch the printing mechanism was a bit of a tangled
mess. This patch attempts to consolidate printing into one interface.
All (non-debug) printing now uses the mp_print* family of functions,
mainly mp_printf. All these functions take an mp_print_t structure as
their first argument, and this structure defines the printing backend
through the "print_strn" function of said structure.
Printing from the uPy core can reach the platform-defined print code via
two paths: either through mp_sys_stdout_obj (defined pert port) in
conjunction with mp_stream_write; or through the mp_plat_print structure
which uses the MP_PLAT_PRINT_STRN macro to define how string are printed
on the platform. The former is only used when MICROPY_PY_IO is defined.
With this new scheme printing is generally more efficient (less layers
to go through, less arguments to pass), and, given an mp_print_t*
structure, one can call mp_print_str for efficiency instead of
mp_printf("%s", ...). Code size is also reduced by around 200 bytes on
Thumb2 archs.
splitlines() occurs ~179 times in CPython3 standard library, so was
deemed worthy to implement. The method has subtle semantic differences
from just .split("\n"). It is also defined as working for any end-of-line
combination, but this is currently not implemented - it works only with
LF line-endings (which should be OK for text strings on any platforms,
but not OK for bytes).
This cleans up vstr so that it's a pure "variable buffer", and the user
can decide whether they need to add a terminating null byte. In most
places where vstr is used, the vstr did not need to be null terminated
and so this patch saves code size, a tiny bit of RAM, and makes vstr
usage more efficient. When null termination is needed it must be
done explicitly using vstr_null_terminate.
There was really weird warning (promoted to error) when building Windows
port. Exact cause is still unknown, but it uncovered another issue:
8-bit and unicode str_make_new implementations should be mutually exclusive,
and not built at the same time. What we had is that bytes_decode() pulled
8-bit str_make_new() even for unicode build.
With this patch str/bytes construction is streamlined. Always use a
vstr to build a str/bytes object. If the size is known beforehand then
use vstr_init_len to allocate only required memory. Otherwise use
vstr_init and the vstr will grow as needed. Then use
mp_obj_new_str_from_vstr to create a str/bytes object using the vstr
memory.
Saves code ROM: 68 bytes on stmhal, 108 bytes on bare-arm, and 336 bytes
on unix x64.
This patch allows to reuse vstr memory when creating str/bytes object.
This improves memory usage.
Also saves code ROM: 128 bytes on stmhal, 92 bytes on bare-arm, and 88
bytes on unix x64.
Previous patch c38dc3ccc7 allowed any
object to be compared with any other, using pointer comparison for a
fallback. As such, existing code which checked for this case is no
longer needed.
Behaviour of array initialisation is subtly different for bytes,
bytearray and array.array when argument has buffer protocol. This patch
gets us CPython conformant (except we allow initialisation of
array.array by buffer with length not a multiple of typecode).
Going from MICROPY_ERROR_REPORTING_NORMAL to
MICROPY_ERROR_REPORTING_TERSE now saves 2020 bytes ROM for ARM Thumb2,
and 2200 bytes ROM for 32-bit x86.
This is about a 2.5% code size reduction for bare-arm.
This turns failing assertions to type exceptions for things like
b"123".find(...). We still don't support operations like this on bytes
objects (unlike CPython), but at least it no longer crashes.
Eg b"123" + bytearray(2) now works. This patch actually decreases code
size while adding functionality: 32-bit unix down by 128 bytes, stmhal
down by 84 bytes.
It should be fair to say that almost in all cases where some API call
expects string, it should be also possible to pass byte string. For example,
it should be open/delete/rename file with name as bytestring. Note that
similar change was done quite a long ago to mp_obj_str_get_data().
Multiplication of a tuple, list, str or bytes now yields an empty
sequence (instead of crashing). Addresses issue #799
Also added ability to mult bytes on LHS by integer.
This will allow roughly the same behavior as Python3 for non-ASCII strings,
for example, print("<phrase in non-Latin script>".split()) will print list
of words, not weird hex dump (like Python2 behaves). (Of course, that it
will print list of words, if there're "words" in that phrase at all, separated
by ASCII-compatible whitespace; that surely won't apply to every human
language in existence).
Some small fixed:
- Combine 'x' and 'X' cases in str format code.
- Remove trailing spaces from some lines.
- Make exception messages consistently begin with lower case (then
needed to change those in objarray and objtuple so the same
constant string data could be used).
- Fix bug with exception message having %c instead of %%c.