When just the bytecode emitter is needed there is no need to have a
dynamic method table for the emitter back-end, and we can instead
directly call the mp_emit_bc_XXX functions. This gives a significant
reduction in code size and a very slight performance boost for the
compiler.
This patch saves 1160 bytes code on Thumb2 and 972 bytes on x86, when
native emitters are disabled.
Overall savings in code over the last 3 commits are:
bare-arm: 1664 bytes.
minimal: 2136 bytes.
stmhal: 584 bytes (it has native emitter enabled).
cc3200: 1736 bytes.
First pass for the compiler is computing the scope (eg if an identifier
is local or not) and originally had an entire table of methods dedicated
to this, most of which did nothing. With changes from previous commit,
this set of methods can be removed and the methods from the bytecode
emitter used instead, with very little modification -- this is what is
done in this commit.
This factoring has little to no impact on the speed of the compiler
(tested by compiling 3763 Python scripts and timing it).
This factoring reduces code size by about 270-300 bytes on Thumb2 archs,
and 400 bytes on x86.
mp_obj_t internal representation doesn't have to be a pointer to object,
it can be anything.
There's also a support for back-conversion in the form of MP_OBJ_UNCAST.
This is kind of optimization/status quo preserver to minimize patching the
existing code and avoid doing potentially expensive MP_OBJ_CAST over and
over. But then one may imagine implementations where MP_OBJ_UNCAST is very
expensive. But such implementations are unlikely interesting in practice.
Despite initial guess, this code factoring does not hamper performance.
In fact it seems to improve speed by a little: running pystone(1.2) on
pyboard (which gives a very stable result) this patch takes pystones
from 1729.51 up to 1742.16. Also, pystones on x64 increase by around
the same proportion (but it's much noisier).
Taking a look at the generated machine code, stack usage with this patch
is unchanged, and call is tail-optimised with all arguments in
registers. Code size decreases by about 50 bytes on Thumb2 archs.
"Base" should rather refer to "base type"."Base object for attribute
lookup" should rather be just "object".
Also, a case of common subexpression elimination.
Given that there's already support for "fixed table" maps, which are
essentially ordered maps, the implementation of OrderedDict just extends
"fixed table" maps by adding an "is ordered" flag and add/remove
operations, and reuses 95% of objdict code, just making methods tolerant
to both dict and OrderedDict.
Some things are missing so far, like CPython-compatible repr and comparison.
OrderedDict is Disabled by default; enabled on unix and stmhal ports.
These allow to fine-tune the compiler to select whether it optimises
tuple assignments of the form a, b = c, d and a, b, c = d, e, f.
Sensible defaults are provided.
This is rarely used feature which takes enough code to implement, so is
controlled by MICROPY_PY_ARRAY_SLICE_ASSIGN config setting, default off.
But otherwise it may be useful, as allows to update arbitrary-sized data
buffers in-place.
Slice is yet to implement, and actually, slice assignment implemented in
such a way that RHS of assignment should be array of the exact same item
typecode as LHS. CPython has it more relaxed, where RHS can be any sequence
of compatible types (e.g. it's possible to assign list of int's to a
bytearray slice).
Overall, when all "slice write" features are implemented, it may cost ~1KB
of code.
This makes exception traceback info self contained (ie doesn't rely on
list object, which was a bit of a hack), reduces code size, and reduces
RAM footprint of exception by eliminating the list object.
Addresses part of issue #1126.
The implementation of these functions is very large (order 4k) and they
are rarely used, so we don't enable them by default.
They are however enabled in stmhal and unix, since we have the room.
Most of printing infrastructure now uses streams, but mp_obj_print() used
libc's printf(), which led to weird buffering issues in output. So, switch
mp_obj_print() to streams too, even though it may make sense to move it to
a separate file, as it is purely a debugging function now.
Relative imports are based of a package, so we're currently at a module
within a package, we should get to package first.
Also, factor out path travsering operation, but this broke testing for
boundary errors with relative imports. TODO: reintroduce them, together
with proper tests.
Traceback allocation for exception will now never lead to recursive
MemoryError exception - if there's no memory for traceback, it simply
won't be created.
Pushing same NLR record twice would lead to "infinite loop" in nlr_jump
(but more realistically, it will crash as soon as NLR record on stack is
overwritten).
Previous to this patch, a big-int, float or imag constant was interned
(made into a qstr) and then parsed at runtime to create an object each
time it was needed. This is wasteful in RAM and not efficient. Now,
these constants are parsed straight away in the parser and turned into
objects. This allows constants with large numbers of digits (so
addresses issue #1103) and takes us a step closer to #722.
To enable parsing constants more efficiently, mp_parse should be allowed
to raise an exception, and mp_compile can already raise a MemoryError.
So these functions need to be protected by an nlr push/pop block.
This patch adds that feature in all places. This allows to simplify how
mp_parse and mp_compile are called: they now raise an exception if they
have an error and so explicit checking is not needed anymore.
This cleans up vstr so that it's a pure "variable buffer", and the user
can decide whether they need to add a terminating null byte. In most
places where vstr is used, the vstr did not need to be null terminated
and so this patch saves code size, a tiny bit of RAM, and makes vstr
usage more efficient. When null termination is needed it must be
done explicitly using vstr_null_terminate.
Eg, "() + 1" now tells you that __add__ is not supported for tuple and
int types (before it just said the generic "binary operator"). We reuse
the table of names for slot lookup because it would be a waste of code
space to store the pretty name for each operator.
- namedtuple was wrongly using MP_OBJ_QSTR_VALUE instead of mp_obj_str_get_qstr,
so when passed a non-interned string it would segfault; fix this by using mp_obj_str_get_qstr
- store the namedtuple field names as qstrs so it is not needed to use mp_obj_str_get_qstr
everytime the field name has to be accessed. This also slighty increases performance when
fetching attributes
There was really weird warning (promoted to error) when building Windows
port. Exact cause is still unknown, but it uncovered another issue:
8-bit and unicode str_make_new implementations should be mutually exclusive,
and not built at the same time. What we had is that bytes_decode() pulled
8-bit str_make_new() even for unicode build.
With this patch str/bytes construction is streamlined. Always use a
vstr to build a str/bytes object. If the size is known beforehand then
use vstr_init_len to allocate only required memory. Otherwise use
vstr_init and the vstr will grow as needed. Then use
mp_obj_new_str_from_vstr to create a str/bytes object using the vstr
memory.
Saves code ROM: 68 bytes on stmhal, 108 bytes on bare-arm, and 336 bytes
on unix x64.
This patch allows to reuse vstr memory when creating str/bytes object.
This improves memory usage.
Also saves code ROM: 128 bytes on stmhal, 92 bytes on bare-arm, and 88
bytes on unix x64.
pyexec_friendly_repl_process_char() and friends, useful for ports which
integrate into existing cooperative multitasking system.
Unlike readline() refactor before, this was implemented in less formal,
trial&error process, minor functionality regressions are still known
(like soft&hard reset support). So, original loop-based pyexec_friendly_repl()
is left intact, specific implementation selectable by config setting.
Bytecode also needs a pass to compute the stack size. This is because
the state size of the bytecode function is encoded as a variable uint,
so we must know the value of this uint before we encode it (otherwise
the size of the generated code changes from one pass to the next).
Having an entire pass for this seems wasteful (in time). Alternative is
to allocate fixed space for the state size (would need 3-4 bytes to be
general, when 1 byte is usually sufficient) which uses a bit of extra
RAM per bytecode function, and makes the code less elegant in places
where this uint is encoded/decoded.
So, for now, opt for an extra pass.
Native code has GC-heap pointers in it so it must be scanned. But on
unix port memory for native functions is mmap'd, and so it must have
explicit code to scan it for root pointers.
Previously to this patch all constant string/bytes objects were
interned by the compiler, and this lead to crashes when the qstr was too
long (noticeable now that qstr length storage defaults to 1 byte).
With this patch, long string/bytes objects are never interned, and are
referenced directly as constant objects within generated code using
load_const_obj.
This new config option sets how many fixed-number-of-bytes to use to
store the length of each qstr. Previously this was hard coded to 2,
but, as per issue #1056, this is considered overkill since no-one
needs identifiers longer than 255 bytes.
With this patch the number of bytes for the length is configurable, and
defaults to 1 byte. The configuration option filters through to the
makeqstrdata.py script.
Code size savings going from 2 to 1 byte:
- unix x64 down by 592 bytes
- stmhal down by 1148 bytes
- bare-arm down by 284 bytes
Also has RAM savings, and will be slightly more efficient in execution.
Previous patch c38dc3ccc7 allowed any
object to be compared with any other, using pointer comparison for a
fallback. As such, existing code which checked for this case is no
longer needed.
Compiler optimises lookup of module.CONST when enabled (an existing
feature). Disabled by default; enabled for unix, windows, stmhal.
Costs about 100 bytes ROM on stmhal.
This allows to enable mem-info functions in micropython module, even if
MICROPY_MEM_STATS is not enabled. In this case, you get mem_info and
qstr_info but not mem_{total,current,peak}.
GC for unix/windows builds doesn't make use of the bss section anymore,
so we do not need the (sometimes complicated) build features and code related to it
This is a simple optimisation inspired by JITing technology: we cache in
the bytecode (using 1 byte) the offset of the last successful lookup in
a map. This allows us next time round to check in that location in the
hash table (mp_map_t) for the desired entry, and if it's there use that
entry straight away. Otherwise fallback to a normal map lookup.
Works for LOAD_NAME, LOAD_GLOBAL, LOAD_ATTR and STORE_ATTR opcodes.
On a few tests it gives >90% cache hit and greatly improves speed of
code.
Disabled by default. Enabled for unix and stmhal ports.
This patch consolidates all global variables in py/ core into one place,
in a global structure. Root pointers are all located together to make
GC tracing easier and more efficient.
This is consistent with how BC_JUMP was handled before. We never show jumps
destinations relative to jump instrucion itself, only relative to beginning
of function. Another useful way to show them as absolute (real memory
address), and this change makes result expected and consistent with how
BC_JUMP is shown.
The compiler treats `if (MICROPY_ERROR_REPORTING == MICROPY_ERROR_REPORTING_TERSE)` as
a normal statement and generates assembly for it in degug mode as if MICROPY_ERROR_REPORTING
is an actual symbol instead of a preprocessor definition.
As such linking fails because mp_arg_error_terse_mismatch is not defined when
MICROPY_ERROR_REPORTING_TERSE is detailed or normal.