It's much more efficient in RAM and code size to do implicit literal string
concatenation in the lexer, as opposed to the compiler.
RAM usage is reduced because the concatenation can be done right away in the
tokeniser by just accumulating the string/bytes literals into the lexer's
vstr. Prior to this patch adjacent strings/bytes would create a parse tree
(one node per string/bytes) and then in the compiler a whole new chunk of
memory was allocated to store the concatenated string, which used more than
double the memory compared to just accumulating in the lexer.
This patch also significantly reduces code size:
bare-arm: -204
minimal: -204
unix x64: -328
stmhal: -208
esp8266: -284
cc3200: -224
Grammar rules have 2 variants: ones that are attached to a specific
compile function which is called to compile that grammar node, and ones
that don't have a compile function and are instead just inspected to see
what form they take.
In the compiler there is a table of all grammar rules, with each entry
having a pointer to the associated compile function. Those rules with no
compile function have a null pointer. There are 120 such rules, so that's
120 words of essentially wasted code space.
By grouping together the compile vs no-compile rules we can put all the
no-compile rules at the end of the list of rules, and then we don't need
to store the null pointers. We just have a truncated table and it's
guaranteed that when indexing this table we only index the first half,
the half with populated pointers.
This patch implements such a grouping by having a specific macro for the
compile vs no-compile grammar rules (DEF_RULE vs DEF_RULE_NC). It saves
around 460 bytes of code on 32-bit archs.
This patch moves some common code from the individual inline assemblers to
the compiler, the code that calls the emit-glue to assign the machine code
to the functions scope.
This patch adds the MICROPY_EMIT_INLINE_XTENSA option, which, when
enabled, allows the @micropython.asm_xtensa decorator to be used.
The following opcodes are currently supported (ax is a register, a0-a15):
ret_n()
callx0(ax)
j(label)
jx(ax)
beqz(ax, label)
bnez(ax, label)
mov(ax, ay)
movi(ax, imm) # imm can be full 32-bit, uses l32r if needed
and_(ax, ay, az)
or_(ax, ay, az)
xor(ax, ay, az)
add(ax, ay, az)
sub(ax, ay, az)
mull(ax, ay, az)
l8ui(ax, ay, imm)
l16ui(ax, ay, imm)
l32i(ax, ay, imm)
s8i(ax, ay, imm)
s16i(ax, ay, imm)
s32i(ax, ay, imm)
l16si(ax, ay, imm)
addi(ax, ay, imm)
ball(ax, ay, label)
bany(ax, ay, label)
bbc(ax, ay, label)
bbs(ax, ay, label)
beq(ax, ay, label)
bge(ax, ay, label)
bgeu(ax, ay, label)
blt(ax, ay, label)
bnall(ax, ay, label)
bne(ax, ay, label)
bnone(ax, ay, label)
Upon entry to the assembly function the registers a0, a12, a13, a14 are
pushed to the stack and the stack pointer (a1) decreased by 16. Upon
exit, these registers and the stack pointer are restored, and ret.n is
executed to return to the caller (caller address is in a0).
Note that the ABI for the Xtensa emitters is non-windowing.
It is split into 2 functions, one to make small ints and the other to make
a non-small-int leaf node. This reduces code size by 32 bytes on
bare-arm, 64 bytes on unix (x64-64) and 144 bytes on stmhal.
When an exception is raised and is to be handled by the VM, it is stored
on the Python value stack so the bytecode can access it. CPython stores
3 objects on the stack for each exception: exc type, exc instance and
traceback. uPy followed this approach, but it turns out not to be
necessary. Instead, it is enough to store just the exception instance on
the Python value stack. The only place where the 3 values are needed
explicitly is for the __exit__ handler of a with-statement context, but
for these cases the 3 values can be extracted from the single exception
instance.
This patch removes the need to store 3 values on the stack, and instead
just stores the exception instance.
Code size is reduced by about 50-100 bytes, the compiler and VM are
slightly simpler, generate bytecode is smaller (by 2 bytes for each try
block), and the Python value stack is reduced in size for functions that
handle exceptions.
The 3 kinds of comprehensions are similar enough that merging their emit
functions reduces code size. Decreases in code size in bytes are:
bare-arm:24, minimal:96, unix(NDEBUG,x86-64):328, stmhal:80, esp8266:76.
They are sugar for marking function as generator, "yield from"
and pep492 python "semantically equivalents" respectively.
@dpgeorge was the original author of this patch, but @pohmelie made
changes to implement `async for` and `async with`.
Previous to this patch, the "**b" in "a**b" had its own parse node with
just one item (the "b"). Now, the "b" is just the last element of the
power parse-node. This saves (a tiny bit of) RAM when compiling.
This new compile-time option allows to make the bytecode compiler
configurable at runtime by setting the fields in the mp_dynamic_compiler
structure. By using this feature, the compiler can generate bytecode
that targets any MicroPython runtime/VM, regardless of the host and
target compile-time settings.
Options so far that fall under this dynamic setting are:
- maximum number of bits that a small int can hold;
- whether caching of lookups is used in the bytecode;
- whether to use unicode strings or not (lexer behaviour differs, and
therefore generated string constants differ).
Before this patch, (x+y)*z would be parsed to a tree that contained a
redundant identity parse node corresponding to the parenthesis. With
this patch such nodes are optimised away, which reduces memory
requirements for expressions with parenthesis, and simplifies the
compiler because it doesn't need to handle this identity case.
A parenthesis parse node is still needed for tuples.
MICROPY_ENABLE_COMPILER can be used to enable/disable the entire compiler,
which is useful when only loading of pre-compiled bytecode is supported.
It is enabled by default.
MICROPY_PY_BUILTINS_EVAL_EXEC controls support of eval and exec builtin
functions. By default they are only included if MICROPY_ENABLE_COMPILER
is enabled.
Disabling both options saves about 40k of code size on 32-bit x86.
To use, put the following in mpconfigport.h:
#define MICROPY_OBJ_REPR (MICROPY_OBJ_REPR_D)
#define MICROPY_FLOAT_IMPL (MICROPY_FLOAT_IMPL_DOUBLE)
typedef int64_t mp_int_t;
typedef uint64_t mp_uint_t;
#define UINT_FMT "%llu"
#define INT_FMT "%lld"
Currently does not work with native emitter enabled.
It makes much more sense to do constant folding in the parser while the
parse tree is being built. This eliminates the need to create parse
nodes that will just be folded away. The code is slightly simpler and a
bit smaller as well.
Constant folding now has a configuration option,
MICROPY_COMP_CONST_FOLDING, which is enabled by default.