This commit implements a more complete replication of CPython's behaviour
for equality and inequality testing of objects. This addresses the issues
discussed in #5382 and a few other inconsistencies. Improvements over the
old code include:
- Support for returning non-boolean results from comparisons (as used by
numpy and others).
- Support for non-reflexive equality tests.
- Preferential use of __ne__ methods and MP_BINARY_OP_NOT_EQUAL binary
operators for inequality tests, when available.
- Fallback to op2 == op1 or op2 != op1 when op1 does not implement the
(in)equality operators.
The scheme here makes use of a new flag, MP_TYPE_FLAG_NEEDS_FULL_EQ_TEST,
in the flags word of mp_obj_type_t to indicate if various shortcuts can or
cannot be used when performing equality and inequality tests. Currently
four built-in classes have the flag set: float and complex are
non-reflexive (since nan != nan) while bytearray and frozenszet instances
can equal other builtin class instances (bytes and set respectively). The
flag is also set for any new class defined by the user.
This commit also includes a more comprehensive set of tests for the
behaviour of (in)equality operators implemented in special methods.
Prior to this patch the amount of free space in an array (including
bytearray) was not being maintained correctly for the case of slice
assignment which changed the size of the array. Under certain cases (as
encoded in the new test) it was possible that the array could grow beyond
its allocated memory block and corrupt the heap.
Fixes issue #4127.
Reuse the implementation for bytes since it works the same way regardless
of the underlying type. This method gets added for CPython compatibility
of bytearray, but to keep the code simple and small array.array now also
has a working decode method, which is non-standard but doesn't hurt.
This allows figuring out the number of bytes in the memoryview object as
len(memview) * memview.itemsize.
The feature is enabled via MICROPY_PY_BUILTINS_MEMORYVIEW_ITEMSIZE and is
disabled by default.
Prior to this commit, building the unix port with `DEBUG=1` and
`-finstrument-functions` the compilation would fail with an error like
"control reaches end of non-void function". This change fixes this by
removing the problematic "if (0)" branches. Not all branches affect
compilation, but they are all removed for consistency.
These macros could in principle be (inline) functions so it makes sense to
have them lower case, to match the other C API functions.
The remaining macros that are upper case are:
- MP_OBJ_TO_PTR, MP_OBJ_FROM_PTR
- MP_OBJ_NEW_SMALL_INT, MP_OBJ_SMALL_INT_VALUE
- MP_OBJ_NEW_QSTR, MP_OBJ_QSTR_VALUE
- MP_OBJ_FUN_MAKE_SIG
- MP_DECLARE_CONST_xxx
- MP_DEFINE_CONST_xxx
These must remain macros because they are used when defining const data (at
least, MP_OBJ_NEW_SMALL_INT is so it makes sense to have
MP_OBJ_SMALL_INT_VALUE also a macro).
For those macros that have been made lower case, compatibility macros are
provided for the old names so that users do not need to change their code
immediately.
Both mp_type_array and mp_type_memoryview use the same object structure,
mp_obj_array_t, but for the case of memoryview, some fields, e.g. "free",
have different meaning. As the "free" field is also a bitfield, assume
that (anonymous) union can't be used here (for the concerns of possible
compatibility issues with wide array of toolchains), and just add a field
alias using a #define. As it's a define, it should be a selective
identifier, so use verbose "memview_offset" to avoid any clashes.
If bytearray is constructed from str, a second argument of encoding is
required (in CPython), and third arg of Unicode error handling is allowed,
e.g.:
bytearray("str", "utf-8", "strict")
This is similar to bytes:
bytes("str", "utf-8", "strict")
This patch just allows to pass 2nd/3rd arguments to bytearray, but
doesn't try to validate them to not impact code size. (This is also
similar to how bytes constructor is handled, though it does a bit
more validation, e.g. check that in case of str arg, encoding argument
is passed.)
Before this patch MP_BINARY_OP_IN had two meanings: coming from bytecode it
meant that the args needed to be swapped, but coming from within the
runtime meant that the args were already in the correct order. This lead
to some confusion in the code and comments stating how args were reversed.
It also lead to 2 bugs: 1) containment for a subclass of a native type
didn't work; 2) the expression "{True} in True" would illegally succeed and
return True. In both of these cases it was because the args to
MP_BINARY_OP_IN ended up being reversed twice.
To fix these things this patch introduces MP_BINARY_OP_CONTAINS which
corresponds exactly to the __contains__ special method, and this is the
operator that built-in types should implement. MP_BINARY_OP_IN is now only
emitted by the compiler and is converted to MP_BINARY_OP_CONTAINS by
swapping the arguments.
Header files that are considered internal to the py core and should not
normally be included directly are:
py/nlr.h - internal nlr configuration and declarations
py/bc0.h - contains bytecode macro definitions
py/runtime0.h - contains basic runtime enums
Instead, the top-level header files to include are one of:
py/obj.h - includes runtime0.h and defines everything to use the
mp_obj_t type
py/runtime.h - includes mpstate.h and hence nlr.h, obj.h, runtime0.h,
and defines everything to use the general runtime support functions
Additional, specific headers (eg py/objlist.h) can be included if needed.
The unary-op/binary-op enums are already defined, and there are no
arithmetic tricks used with these types, so it makes sense to use the
correct enum type for arguments that take these values. It also reduces
code size quite a bit for nan-boxing builds.
- Changed: ValueError, TypeError, NotImplementedError
- OSError invocations unchanged, because the corresponding utility
function takes ints, not strings like the long form invocation.
- OverflowError, IndexError and RuntimeError etc. not changed for now
until we decide whether to add new utility functions.
Allows to iterate over the following without allocating on the heap:
- tuple
- list
- string, bytes
- bytearray, array
- dict (not dict.keys, dict.values, dict.items)
- set, frozenset
Allows to call the following without heap memory:
- all, any, min, max, sum
TODO: still need to allocate stack memory in bytecode for iter_buf.
This follows source code/header file organization similar to few other
objects, and intended to be used only is special cases, where efficiency/
simplicity matters.
The first argument to the type.make_new method is naturally a uPy type,
and all uses of this argument cast it directly to a pointer to a type
structure. So it makes sense to just have it a pointer to a type from
the very beginning (and a const pointer at that). This patch makes
such a change, and removes all unnecessary casting to/from mp_obj_t.
This patch changes the type signature of .make_new and .call object method
slots to use size_t for n_args and n_kw (was mp_uint_t. Makes code more
efficient when mp_uint_t is larger than a machine word. Doesn't affect
ports when size_t and mp_uint_t have the same size.
This allows the mp_obj_t type to be configured to something other than a
pointer-sized primitive type.
This patch also includes additional changes to allow the code to compile
when sizeof(mp_uint_t) != sizeof(void*), such as using size_t instead of
mp_uint_t, and various casts.
Previous to this patch the printing mechanism was a bit of a tangled
mess. This patch attempts to consolidate printing into one interface.
All (non-debug) printing now uses the mp_print* family of functions,
mainly mp_printf. All these functions take an mp_print_t structure as
their first argument, and this structure defines the printing backend
through the "print_strn" function of said structure.
Printing from the uPy core can reach the platform-defined print code via
two paths: either through mp_sys_stdout_obj (defined pert port) in
conjunction with mp_stream_write; or through the mp_plat_print structure
which uses the MP_PLAT_PRINT_STRN macro to define how string are printed
on the platform. The former is only used when MICROPY_PY_IO is defined.
With this new scheme printing is generally more efficient (less layers
to go through, less arguments to pass), and, given an mp_print_t*
structure, one can call mp_print_str for efficiency instead of
mp_printf("%s", ...). Code size is also reduced by around 200 bytes on
Thumb2 archs.
This is rarely used feature which takes enough code to implement, so is
controlled by MICROPY_PY_ARRAY_SLICE_ASSIGN config setting, default off.
But otherwise it may be useful, as allows to update arbitrary-sized data
buffers in-place.
Slice is yet to implement, and actually, slice assignment implemented in
such a way that RHS of assignment should be array of the exact same item
typecode as LHS. CPython has it more relaxed, where RHS can be any sequence
of compatible types (e.g. it's possible to assign list of int's to a
bytearray slice).
Overall, when all "slice write" features are implemented, it may cost ~1KB
of code.