With this patch str/bytes construction is streamlined. Always use a
vstr to build a str/bytes object. If the size is known beforehand then
use vstr_init_len to allocate only required memory. Otherwise use
vstr_init and the vstr will grow as needed. Then use
mp_obj_new_str_from_vstr to create a str/bytes object using the vstr
memory.
Saves code ROM: 68 bytes on stmhal, 108 bytes on bare-arm, and 336 bytes
on unix x64.
This patch allows to reuse vstr memory when creating str/bytes object.
This improves memory usage.
Also saves code ROM: 128 bytes on stmhal, 92 bytes on bare-arm, and 88
bytes on unix x64.
pyexec_friendly_repl_process_char() and friends, useful for ports which
integrate into existing cooperative multitasking system.
Unlike readline() refactor before, this was implemented in less formal,
trial&error process, minor functionality regressions are still known
(like soft&hard reset support). So, original loop-based pyexec_friendly_repl()
is left intact, specific implementation selectable by config setting.
Bytecode also needs a pass to compute the stack size. This is because
the state size of the bytecode function is encoded as a variable uint,
so we must know the value of this uint before we encode it (otherwise
the size of the generated code changes from one pass to the next).
Having an entire pass for this seems wasteful (in time). Alternative is
to allocate fixed space for the state size (would need 3-4 bytes to be
general, when 1 byte is usually sufficient) which uses a bit of extra
RAM per bytecode function, and makes the code less elegant in places
where this uint is encoded/decoded.
So, for now, opt for an extra pass.
Native code has GC-heap pointers in it so it must be scanned. But on
unix port memory for native functions is mmap'd, and so it must have
explicit code to scan it for root pointers.
Previously to this patch all constant string/bytes objects were
interned by the compiler, and this lead to crashes when the qstr was too
long (noticeable now that qstr length storage defaults to 1 byte).
With this patch, long string/bytes objects are never interned, and are
referenced directly as constant objects within generated code using
load_const_obj.
This new config option sets how many fixed-number-of-bytes to use to
store the length of each qstr. Previously this was hard coded to 2,
but, as per issue #1056, this is considered overkill since no-one
needs identifiers longer than 255 bytes.
With this patch the number of bytes for the length is configurable, and
defaults to 1 byte. The configuration option filters through to the
makeqstrdata.py script.
Code size savings going from 2 to 1 byte:
- unix x64 down by 592 bytes
- stmhal down by 1148 bytes
- bare-arm down by 284 bytes
Also has RAM savings, and will be slightly more efficient in execution.
Previous patch c38dc3ccc7 allowed any
object to be compared with any other, using pointer comparison for a
fallback. As such, existing code which checked for this case is no
longer needed.
Compiler optimises lookup of module.CONST when enabled (an existing
feature). Disabled by default; enabled for unix, windows, stmhal.
Costs about 100 bytes ROM on stmhal.
This allows to enable mem-info functions in micropython module, even if
MICROPY_MEM_STATS is not enabled. In this case, you get mem_info and
qstr_info but not mem_{total,current,peak}.
GC for unix/windows builds doesn't make use of the bss section anymore,
so we do not need the (sometimes complicated) build features and code related to it
This is a simple optimisation inspired by JITing technology: we cache in
the bytecode (using 1 byte) the offset of the last successful lookup in
a map. This allows us next time round to check in that location in the
hash table (mp_map_t) for the desired entry, and if it's there use that
entry straight away. Otherwise fallback to a normal map lookup.
Works for LOAD_NAME, LOAD_GLOBAL, LOAD_ATTR and STORE_ATTR opcodes.
On a few tests it gives >90% cache hit and greatly improves speed of
code.
Disabled by default. Enabled for unix and stmhal ports.
This patch consolidates all global variables in py/ core into one place,
in a global structure. Root pointers are all located together to make
GC tracing easier and more efficient.
This is consistent with how BC_JUMP was handled before. We never show jumps
destinations relative to jump instrucion itself, only relative to beginning
of function. Another useful way to show them as absolute (real memory
address), and this change makes result expected and consistent with how
BC_JUMP is shown.
The compiler treats `if (MICROPY_ERROR_REPORTING == MICROPY_ERROR_REPORTING_TERSE)` as
a normal statement and generates assembly for it in degug mode as if MICROPY_ERROR_REPORTING
is an actual symbol instead of a preprocessor definition.
As such linking fails because mp_arg_error_terse_mismatch is not defined when
MICROPY_ERROR_REPORTING_TERSE is detailed or normal.
We are not word-for-word compatible with CPython exceptions, so we are
free to make them short but informative in order to reduce code size.
Also, try to make messages the same as existing ones where possible.
This fixes conversion when float type has more mantissa bits than small int,
and float value has small exponent. This is for example the case of 32-bit
platform using doubles, and converting value of time.time(). Conversion of
floats with larg exponnet is still not handled correctly.
This is for efficiency, so we don't need to subtract 1 from the ip
before storing it to code_state->ip. It saves a lot of ROM bytes on
unix and stmhal.
Mirroring ip to a volatile memory variable for each opcode is an expensive
operation. For quite a lot of often executed opcodes like stack manipulation
or jumps, exceptions cannot actually happen. So, record ip only for opcode
where that's possible.
This patch makes the MICROPY_PY_BUILTINS_SLICE compile-time option
fully disable the builtin slice operation (when set to 0). This
includes removing the slice sytanx from the grammar. Now, enabling
slice costs 4228 bytes on unix x64, and 1816 bytes on stmhal.
This patch makes MICROPY_PY_BUILTINS_SET compile-time option fully
disable the builtin set object (when set to 0). This includes removing
set constructor/comprehension from the grammar, the compiler and the
emitters. Now, enabling set costs 8168 bytes on unix x64, and 3576
bytes on stmhal.
This optimisation reduces the VM exception stack element (mp_exc_stack_t)
by 1 word, by using bit 1 of a pointer to store whether the opcode was a
FINALLY or WITH opcode. This optimisation was pending, waiting for
maturity of the exception handling code, which has now proven itself.
Saves 1 machine word RAM for each exception (4->3 words per exception).
Increases stmhal code by 4 bytes, and decreases unix x64 code by 32
bytes.
This patch gives proper SyntaxError exceptions for bad global/nonlocal
declarations. It also reduces code size: 304 bytes on unix x64, 132
bytes on stmhal.
You can now assign to the range end variable and the for-loop still
works correctly. This fully addresses issue #565.
Also fixed a bug with the stack not being fully popped when breaking out
of an optimised for-loop (and it's actually impossible to write a test
for this case!).