Hashing is now done using mp_unary_op function with MP_UNARY_OP_HASH as
the operator argument. Hashing for int, str and bytes still go via
fast-path in mp_unary_op since they are the most common objects which
need to be hashed.
This lead to quite a bit of code cleanup, and should be more efficient
if anything. It saves 176 bytes code space on Thumb2, and 360 bytes on
x86.
The only loss is that the error message "unhashable type" is now the
more generic "unsupported type for __hash__".
Previous to this patch the printing mechanism was a bit of a tangled
mess. This patch attempts to consolidate printing into one interface.
All (non-debug) printing now uses the mp_print* family of functions,
mainly mp_printf. All these functions take an mp_print_t structure as
their first argument, and this structure defines the printing backend
through the "print_strn" function of said structure.
Printing from the uPy core can reach the platform-defined print code via
two paths: either through mp_sys_stdout_obj (defined pert port) in
conjunction with mp_stream_write; or through the mp_plat_print structure
which uses the MP_PLAT_PRINT_STRN macro to define how string are printed
on the platform. The former is only used when MICROPY_PY_IO is defined.
With this new scheme printing is generally more efficient (less layers
to go through, less arguments to pass), and, given an mp_print_t*
structure, one can call mp_print_str for efficiency instead of
mp_printf("%s", ...). Code size is also reduced by around 200 bytes on
Thumb2 archs.
splitlines() occurs ~179 times in CPython3 standard library, so was
deemed worthy to implement. The method has subtle semantic differences
from just .split("\n"). It is also defined as working for any end-of-line
combination, but this is currently not implemented - it works only with
LF line-endings (which should be OK for text strings on any platforms,
but not OK for bytes).
This cleans up vstr so that it's a pure "variable buffer", and the user
can decide whether they need to add a terminating null byte. In most
places where vstr is used, the vstr did not need to be null terminated
and so this patch saves code size, a tiny bit of RAM, and makes vstr
usage more efficient. When null termination is needed it must be
done explicitly using vstr_null_terminate.
There was really weird warning (promoted to error) when building Windows
port. Exact cause is still unknown, but it uncovered another issue:
8-bit and unicode str_make_new implementations should be mutually exclusive,
and not built at the same time. What we had is that bytes_decode() pulled
8-bit str_make_new() even for unicode build.
With this patch str/bytes construction is streamlined. Always use a
vstr to build a str/bytes object. If the size is known beforehand then
use vstr_init_len to allocate only required memory. Otherwise use
vstr_init and the vstr will grow as needed. Then use
mp_obj_new_str_from_vstr to create a str/bytes object using the vstr
memory.
Saves code ROM: 68 bytes on stmhal, 108 bytes on bare-arm, and 336 bytes
on unix x64.
This patch allows to reuse vstr memory when creating str/bytes object.
This improves memory usage.
Also saves code ROM: 128 bytes on stmhal, 92 bytes on bare-arm, and 88
bytes on unix x64.
Previous patch c38dc3ccc7 allowed any
object to be compared with any other, using pointer comparison for a
fallback. As such, existing code which checked for this case is no
longer needed.
Behaviour of array initialisation is subtly different for bytes,
bytearray and array.array when argument has buffer protocol. This patch
gets us CPython conformant (except we allow initialisation of
array.array by buffer with length not a multiple of typecode).
Going from MICROPY_ERROR_REPORTING_NORMAL to
MICROPY_ERROR_REPORTING_TERSE now saves 2020 bytes ROM for ARM Thumb2,
and 2200 bytes ROM for 32-bit x86.
This is about a 2.5% code size reduction for bare-arm.
This turns failing assertions to type exceptions for things like
b"123".find(...). We still don't support operations like this on bytes
objects (unlike CPython), but at least it no longer crashes.
Eg b"123" + bytearray(2) now works. This patch actually decreases code
size while adding functionality: 32-bit unix down by 128 bytes, stmhal
down by 84 bytes.
It should be fair to say that almost in all cases where some API call
expects string, it should be also possible to pass byte string. For example,
it should be open/delete/rename file with name as bytestring. Note that
similar change was done quite a long ago to mp_obj_str_get_data().
Multiplication of a tuple, list, str or bytes now yields an empty
sequence (instead of crashing). Addresses issue #799
Also added ability to mult bytes on LHS by integer.
This will allow roughly the same behavior as Python3 for non-ASCII strings,
for example, print("<phrase in non-Latin script>".split()) will print list
of words, not weird hex dump (like Python2 behaves). (Of course, that it
will print list of words, if there're "words" in that phrase at all, separated
by ASCII-compatible whitespace; that surely won't apply to every human
language in existence).
Some small fixed:
- Combine 'x' and 'X' cases in str format code.
- Remove trailing spaces from some lines.
- Make exception messages consistently begin with lower case (then
needed to change those in objarray and objtuple so the same
constant string data could be used).
- Fix bug with exception message having %c instead of %%c.
This is not fully correct re: error handling, because we should check that
that types are used consistently (only str's or only bytes), but magically
makes lot of functions support bytes.
Blanket wide to all .c and .h files. Some files originating from ST are
difficult to deal with (license wise) so it was left out of those.
Also merged modpyb.h, modos.h, modstm.h and modtime.h in stmhal/.
When querying an object that supports the buffer protocol, that object
must now return a typecode (as per binary.[ch]). This does not have to
be honoured by the caller, but can be useful for determining element
size.
It has (again) a fast path for ints, and a simplified "slow" path for
everything else.
Also simplify the way str indexing is done (now matches tuple and list).
stmhal relies on pfenv_* to implement its printf. Thus, it needs a
pfenv_print_int which prints a proper 32-bit integer. With latest
change to pfenv, this function became one that took mp_obj_t, and
extracted the integer value from that object.
To fix temporarily, pfenv_print_int has been renamed to
pfenv_print_mp_int (to indicate it takes a mp_obj_t for the int), and
pfenv_print_int has been added (which takes a normal C int). Currently,
pfenv_print_int proxies to pfenv_print_mp_int, but this means it looses
the MSB. Need to find a way to fix this, but the only way I can think
of will duplicate lots of code.
This adds support for almost everything (the comma isn't currently
supported).
The "unspecified" type with floats also doesn't behave exactly like
python.
Tested under unix with float and double
Spot tested on stmhal
Pretty much everyone needs to include map.h, since it's such an integral
part of the Micro Python object implementation. Thus, the definitions
are now in obj.h instead. map.h is removed.
Mostly just a global search and replace. Except rt_is_true which
becomes mp_obj_is_true.
Still would like to tidy up some of the names, but this will do for now.
Originally, .methods was used for methods in a ROM class, and
locals_dict for methods in a user-created class. That distinction is
unnecessary, and we can use locals_dict for ROM classes now that we have
ROMable maps.
This removes an entry in the bloated mp_obj_type_t struct, saving a word
for each ROM object and each RAM object. ROM objects that have a
methods table (now a locals_dict) need an extra word in total (removed
the methods pointer (1 word), no longer need the sentinel (2 words), but
now need an mp_obj_dict_t wrapper (4 words)). But RAM objects save a
word because they never used the methods entry.
Overall the ROM usage is down by a few hundred bytes, and RAM usage is
down 1 word per user-defined type/class.
There is less code (no need to check 2 tables), and now consistent with
the way ROM modules have their tables initialised.
Efficiency is very close to equivaluent.
This is pre-requisite for having efficient implementation of str<->bytes
conversion, and having that efficient is required with unfortunare
str vs bytes dichotomy in Python3.
Each built-in exception is now a type, with base type BaseException.
C exceptions are created by passing a pointer to the exception type to
make an instance of. When raising an exception from the VM, an
instance is created automatically if an exception type is raised (as
opposed to an exception instance).
Exception matching (RT_BINARY_OP_EXCEPTION_MATCH) is now proper.
Handling of parse error changed to match new exceptions.
mp_const_type renamed to mp_type_type for consistency.
Ultimately all static strings should be qstr. This entry in the type
structure is only used for printing error messages (to tell the type of
the bad argument), and printing objects that don't supply a .print method.
Some tools do not support local/static symbols (one example is GNU ld map file).
Exposing all functions will allow to do detailed size comparisons, etc.
Also, added bunch of statics where they were missing, and replaced few identity
functions with global mp_identity().