This patch moves some common code from the individual inline assemblers to
the compiler, the code that calls the emit-glue to assign the machine code
to the functions scope.
This patch adds the MICROPY_EMIT_INLINE_XTENSA option, which, when
enabled, allows the @micropython.asm_xtensa decorator to be used.
The following opcodes are currently supported (ax is a register, a0-a15):
ret_n()
callx0(ax)
j(label)
jx(ax)
beqz(ax, label)
bnez(ax, label)
mov(ax, ay)
movi(ax, imm) # imm can be full 32-bit, uses l32r if needed
and_(ax, ay, az)
or_(ax, ay, az)
xor(ax, ay, az)
add(ax, ay, az)
sub(ax, ay, az)
mull(ax, ay, az)
l8ui(ax, ay, imm)
l16ui(ax, ay, imm)
l32i(ax, ay, imm)
s8i(ax, ay, imm)
s16i(ax, ay, imm)
s32i(ax, ay, imm)
l16si(ax, ay, imm)
addi(ax, ay, imm)
ball(ax, ay, label)
bany(ax, ay, label)
bbc(ax, ay, label)
bbs(ax, ay, label)
beq(ax, ay, label)
bge(ax, ay, label)
bgeu(ax, ay, label)
blt(ax, ay, label)
bnall(ax, ay, label)
bne(ax, ay, label)
bnone(ax, ay, label)
Upon entry to the assembly function the registers a0, a12, a13, a14 are
pushed to the stack and the stack pointer (a1) decreased by 16. Upon
exit, these registers and the stack pointer are restored, and ret.n is
executed to return to the caller (caller address is in a0).
Note that the ABI for the Xtensa emitters is non-windowing.
If a port defines MP_PLAT_COMMIT_EXEC then this function is used to turn
RAM data into executable code. For example a port may want to write the
data to flash for execution. The function must return a pointer to the
executable data.
The constants MP_IOCTL_POLL_xxx, which were stmhal-specific, are moved
from stmhal/pybioctl.h (now deleted) to py/stream.h. And they are renamed
to MP_STREAM_POLL_xxx to be consistent with other such constants.
All uses of these constants have been updated.
If a port defines MICROPY_READER_POSIX or MICROPY_READER_FATFS then
lexer.c now provides an implementation of mp_lexer_new_from_file using
the mp_reader_new_file function.
Implementations of persistent-code reader are provided for POSIX systems
and systems using FatFS. Macros to use these are MICROPY_READER_POSIX and
MICROPY_READER_FATFS respectively. If an alternative implementation is
needed then a port can define the function mp_reader_new_file.
It is split into 2 functions, one to make small ints and the other to make
a non-small-int leaf node. This reduces code size by 32 bytes on
bare-arm, 64 bytes on unix (x64-64) and 144 bytes on stmhal.
This includes StopIteration and thus are important to make Python-coded
iterables work with yield from/await.
Exceptions in Python send() are still not handled and left for future
consideration and optimization.
We allow 'exc.__traceback__ = None' assignment as a low-level optimization
of pre-allocating exception instance and raising it repeatedly - this
avoids memory allocation during raise. However, uPy will keep adding
traceback entries to such exception instance, so before throwing it,
traceback should be cleared like above.
'exc.__traceback__ = None' syntax is CPython compatible. However, unlike
it, reading that attribute or setting it to any other value is not
supported (and not intended to be supported, again, the only reason for
adding this feature is to allow zero-memalloc exception raising).
Its addition was due to an early exploration on how to add CPython-like
stream interface. It's clear that it's not needed and just takes up
bytes in all ports.
With this patch one can now do "make FROZEN_MPY_DIR=../../frozen" to
specify a directory containing scripts to be frozen (as well as absolute
paths).
The compiled .mpy files are now stored in $(BUILD)/frozen_mpy/.
Now, to use frozen bytecode all a port needs to do is define
FROZEN_MPY_DIR to the directory containing the .py files to freeze, and
define MICROPY_MODULE_FROZEN_MPY and MICROPY_QSTR_EXTRA_POOL.
In both parse.c and qstr.c, an internal chunking allocator tidies up
by calling m_renew to shrink an allocated chunk to the size used, and
assumes that the chunk will not move. However, when MICROPY_ENABLE_GC
is false, m_renew calls the system realloc, which does not guarantee
this behaviour. Environments where realloc may return a different
pointer include:
(1) mbed-os with MBED_HEAP_STATS_ENABLED (which adds a wrapper around
malloc & friends; this is where I was hit by the bug);
(2) valgrind on linux (how I diagnosed it).
The fix is to call m_renew_maybe with allow_move=false.
Builtin functions with a fixed number of arguments (0, 1, 2 or 3) are
quite common. Before this patch the wrapper for such a function cost
3 machine words. After this patch it only takes 2, which can reduce the
code size by quite a bit (and pays off even more, the more functions are
added). It also makes function dispatch slightly more efficient in CPU
usage, and furthermore reduces stack usage for these cases. On x86 and
Thumb archs the dispatch functions are now tail-call optimised by the
compiler.
The bare-arm port has its code size increase by 76 bytes, but stmhal drops
by 904 bytes. Stack usage by these builtin functions is decreased by 48
bytes on Thumb2 archs.
In order to have more fine-grained control over how builtin functions are
constructed, the MP_DECLARE_CONST_FUN_OBJ macros are made more specific,
with suffix of _0, _1, _2, _3, _VAR, _VAR_BETEEN or _KW. These names now
match the MP_DEFINE_CONST_FUN_OBJ macros.
As long as a port implement mp_hal_sleep_ms(), mp_hal_ticks_ms(), etc.
functions, it can just use standard implementations of utime.sleel_ms(),
utime.ticks_ms(), etc. Python-level functions.
Now there is just one function to allocate a new vstr, namely vstr_new
(in addition to vstr_init etc). The caller of this function should know
what initial size to allocate for the buffer, or at least have some policy
or config option, instead of leaving it to a default (as it was before).
This refactors ujson.loads(s) to behave as ujson.load(StringIO(s)).
Increase in code size is: 366 bytes for unix x86-64, 180 bytes for
stmhal, 84 bytes for esp8266.
Setting emit_dent=0 is unnecessary because arriving in that part of the
if-logic will guarantee that emit_dent is already zero.
The block to check indent_top(lex)>0 is unreachable because a newline is
always inserted an the end of the input stream, and hence dedents are
always processed before EOF.
Similar to how binary op already works. Common unary operations already
have fast paths for bool so there's no need to have explicit handling of
ops in bool_unary_op, especially since they have the same behaviour as
integers.
On 32-bit archs this makes the scope_t struct 48 bytes in size, which fits
in 3 GC blocks (previously it used 4 GC blocks). This will lead to some
savings when compiling scripts because there are usually quite a few scopes,
one for each function and class.
Note that qstrs will fit in 16 bits, this assumption is made in a few other
places.
Following how other objects work, set/frozenset methods should use the
mp_check_self() macro to check the type of the self argument, because in
most cases this check can be a null operation.
Saves about 100-180 bytes of code for builds with set and frozenset
enabled.
Having a micropython.const identity function, and writing "from micropython
import const" at the start of scripts that use the const feature, allows to
write scripts which are compatible with CPython, and with uPy builds that
don't include const optimisation.
This patch adds such a function and updates the tests to do the import.
When an exception is raised and is to be handled by the VM, it is stored
on the Python value stack so the bytecode can access it. CPython stores
3 objects on the stack for each exception: exc type, exc instance and
traceback. uPy followed this approach, but it turns out not to be
necessary. Instead, it is enough to store just the exception instance on
the Python value stack. The only place where the 3 values are needed
explicitly is for the __exit__ handler of a with-statement context, but
for these cases the 3 values can be extracted from the single exception
instance.
This patch removes the need to store 3 values on the stack, and instead
just stores the exception instance.
Code size is reduced by about 50-100 bytes, the compiler and VM are
slightly simpler, generate bytecode is smaller (by 2 bytes for each try
block), and the Python value stack is reduced in size for functions that
handle exceptions.
This fixes constant substitution so that only standalone identifiers are
replaced with their constant value (if they have one). I.e. don't
replace NAME in expressions like obj.NAME or NAME = expr.
qstrs ids are restricted to fit within 2 bytes already (eg in persistent
bytecode) so it's safe to use a uint16_t to store them in mp_arg_t. And
the flags member only needs a maximum of 2 bytes so can also use uint16_t.
Savings in code size can be significant when many mp_arg_t structs are
used for argument parsing. Eg, this patch reduces stmhal by 480 bytes.
The system printf is no longer used by the core uPy code. Instead, the
platform print stream or DEBUG_printf is used. Using DEBUG_printf in the
showbc functions would mean that the code can't be tested by the test
suite, so use the normal output instead.
This patch also fixes parsing of bytecode-line-number mappings.
The vstr.had_error flag was a relic from the very early days which assumed
that the malloc functions (eg m_new, m_renew) returned NULL if they failed
to allocate. But that's no longer the case: these functions will raise an
exception if they fail.
Since it was impossible for had_error to be set, this patch introduces no
change in behaviour.
An alternative option would be to change the malloc calls to the _maybe
variants, which return NULL instead of raising, but then a lot of code
will need to explicitly check if the vstr had an error and raise if it
did.
The code-size savings for this patch are, in bytes: bare-arm:188,
minimal:456, unix(NDEBUG,x86-64):368, stmhal:228, esp8266:360.