There is a lot potential in compress bytecodes and make more use of the
coding space. This patch introduces "multi" bytecodes which have their
argument included in the bytecode (by addition).
UNARY_OP and BINARY_OP now no longer take a 1 byte argument for the
opcode. Rather, the opcode is included in the first byte itself.
LOAD_FAST_[0,1,2] and STORE_FAST_[0,1,2] are removed in favour of their
multi versions, which can take an argument between 0 and 15 inclusive.
The majority of LOAD_FAST/STORE_FAST codes fit in this range and so this
saves a byte for each of these.
LOAD_CONST_SMALL_INT_MULTI is used to load small ints between -16 and 47
inclusive. Such ints are quite common and now only need 1 byte to
store, and now have much faster decoding.
In all this patch saves about 2% RAM for typically bytecode (1.8% on
64-bit test, 2.5% on pyboard test). It also reduces the binary size
(because bytecodes are simplified) and doesn't harm performance.
This saves a lot of RAM for 2 reasons:
1. For functions that don't have default values, var args or var kw
args (which is a large number of functions in the general case), the
mp_obj_fun_bc_t type now fits in 1 GC block (previously needed 2 because
of the extra pointer to point to the arg_names array). So this saves 16
bytes per function (32 bytes on 64-bit machines).
2. Combining separate memory regions generally saves RAM because the
unused bytes at the end of the GC block are saved for 1 of the blocks
(since that block doesn't exist on its own anymore). So generally this
saves 8 bytes per function.
Tested by importing lots of modules:
- 64-bit Linux gave about an 8% RAM saving for 86k of used RAM.
- pyboard gave about a 6% RAM saving for 31k of used RAM.
Code-info size, block name, source name, n_state and n_exc_stack now use
variable length encoded uints. This saves 7-9 bytes per bytecode
function for most functions.
Reduces by about a factor of 10 on average the amount of RAM needed to
store the line-number to bytecode map in the bytecode prelude.
Using CPython3.4's stdlib for statistics: previously, an average of
13 bytes were used per (bytecode offset, line-number offset) pair, and
now with this improvement, that's down to 1.3 bytes on average.
Large RAM usage before was due to some very large steps in line numbers,
both from the start of the first line in a function way down in the
file, and also functions that have big comments and/or big strings in
them (both cases were significant).
Although the savings are large on average for the CPython stdlib, it
won't have such a big effect for small scripts used in embedded
programming.
Addresses issue #648.
dummy_data field is accessed as uint value (e.g.
in emit_write_bytecode_byte_ptr), but is not aligned as such, which causes
bus errors or incorrect behavior on any arch requiring strictly aligned
data (ARM pre-v7, MIPS, etc, etc).
Native emitter can now compile try/except blocks using nlr_push/nlr_pop.
It probably only works for 1 level of exception handling. It doesn't
work on Thumb (only x64).
Native emitter can also handle some additional op codes.
With this patch, 198 tests now pass using "-X emit=native" option to
micropython.
Needed to pop the iterator object when breaking out of a for loop. Need
also to be careful to unwind exception handler before popping iterator.
Addresses issue #635.
Blanket wide to all .c and .h files. Some files originating from ST are
difficult to deal with (license wise) so it was left out of those.
Also merged modpyb.h, modos.h, modstm.h and modtime.h in stmhal/.
3 emitter functions are needed only for emitcpy, and so we can #if them
out when compiling with emitcpy support.
Also remove unused SETUP_LOOP bytecode.
Closed over variables are now passed on the stack, instead of creating a
tuple and passing that. This way memory for the closed over variables
can be allocated within the closure object itself. See issue #510 for
background.
Attempt to address issue #386. unique_code_id's have been removed and
replaced with a pointer to the "raw code" information. This pointer is
stored in the actual byte code (aligned, so the GC can trace it), so
that raw code (ie byte code, native code and inline assembler) is kept
only for as long as it is needed. In memory it's now like a tree: the
outer module's byte code points directly to its children's raw code. So
when the outer code gets freed, if there are no remaining functions that
need the raw code, then the children's code gets freed as well.
This is pretty much like CPython does it, except that CPython stores
indexes in the byte code rather than machine pointers. These indices
index the per-function constant table in order to find the relevant
code.
This is necessary to catch all cases where locals are referenced before
assignment. We still keep the _0, _1, _2 versions of LOAD_FAST to help
reduced the byte code size in RAM.
Addresses issue #457.
This simplifies the compiler a little, since now it can do 1 pass over
a function declaration, to determine default arguments. I would have
done this originally, but CPython 3.3 somehow had the default keyword
args compiled before the default position args (even though they appear
in the other order in the text of the script), and I thought it was
important to have the same order of execution when evaluating default
arguments. CPython 3.4 has changed the order to the more obvious one,
so we can also change.
Very little has changed. In Python 3.4 they removed the opcode
STORE_LOCALS, but in Micro Python we only ever used this for CPython
compatibility, so it was a trivial thing to remove. It also allowed to
clean up some dead code (eg the 0xdeadbeef in class construction), and
now class builders use 1 less stack word.
Python 3.4.0 introduced the LOAD_CLASSDEREF opcode, which I have not
yet understood. Still, all tests (apart from bytecode test) still pass.
Bytecode tests needs some more attention, but they are not that
important anymore.
Adding this bytecode allows to remove 4 others related to
function/method calls with * and ** support. Will also help with
bytecodes that make functions/closures with default positional and
keyword args.
Mostly just a global search and replace. Except rt_is_true which
becomes mp_obj_is_true.
Still would like to tidy up some of the names, but this will do for now.
Rationale: setting up the stack (state for locals and exceptions) is
really part of the "code", it's the prelude of the function. For
example, native code adjusts the stack pointer on entry to the function.
Native code doesn't need to know n_state for any other reason. So
putting the state size in the bytecode prelude is sensible.
It reduced ROM usage on STM by about 30 bytes :) And makes it easier to
pass information about the bytecode between functions.
Assuming we have truncating (floor) division, way to do ceiling division
by N is to use formula (x + (N-1)) / N. Specifically, 63 bits, if stored
7 bits per byte, require exactly 9 bytes. 64 bits overflow that and require
10 bytes.