Instead of caching data that is constant (code_info, const_table and
n_state), store just a pointer to the underlying function object from which
this data can be derived.
This helps reduce stack usage for the case when the mp_code_state_t
structure is stored on the stack, as well as heap usage when it's stored
on the heap.
The downside is that the VM becomes a little more complex because it now
needs to derive the data from the underlying function object. But this
doesn't impact the performance by much (if at all) because most of the
decoding of data is done outside the main opcode loop. Measurements using
pystone show that little to no performance is lost.
This patch also fixes a nasty bug whereby the bytecode can be reclaimed by
the GC during execution. With this patch there is always a pointer to the
function object held by the VM during execution, since it's stored in the
mp_code_state_t structure.
When make is passed "-B" it seems that everything is considered out-of-date
and so $? expands to all prerequisites. Thus there is no need for a
special check to see if $? is emtpy.
Some stack is allocated to format ints, and when the int implementation uses
long-long there should be additional stack allocated compared with the other
cases. This patch uses the existing "fmt_int_t" type to determine the
amount of stack to allocate.
This patch refactors the error handling in the lexer, to simplify it (ie
reduce code size).
A long time ago, when the lexer/parser/compiler were first written, the
lexer and parser were designed so they didn't use exceptions (ie nlr) to
report errors but rather returned an error code. Over time that has
gradually changed, the parser in particular has more and more ways of
raising exceptions. Also, the lexer never really handled all errors without
raising, eg there were some memory errors which could raise an exception
(and in these rare cases one would get a fatal nlr-not-handled fault).
This patch accepts the fact that the lexer can raise exceptions in some
cases and allows it to raise exceptions to handle all its errors, which are
for the most part just out-of-memory errors during construction of the
lexer. This makes the lexer a bit simpler, and also the persistent code
stuff is simplified.
What this means for users of the lexer is that calls to it must be wrapped
in a nlr handler. But all uses of the lexer already have such an nlr
handler for the parser (and compiler) so that doesn't put any extra burden
on the callers.
INT_MAX used previosly is indeed max value for int, whereas on LP64
platforms, long is used for mp_int_t. Using MP_SMALL_INT_MAX is the
correct way to do it anyway.
Each threads needs to have its own private references to its current
locals/globals dicts, otherwise functions running within different
contexts (eg imported from different files) can behave very strangely.
There were 2 bugs, now fixed by this patch:
- after deleting an element the len of the dict did not decrease by 1
- after deleting an element searching through the dict could lead to
a seg fault due to there being an MP_OBJ_SENTINEL in the ordered array
In this case, raise an exception without a message.
This would allow to shove few code bytes comparing to currently used
mp_raise_msg(..., "") pattern. (Actual savings depend on function code
alignment used by a particular platform.)
The parser was originally written to work without raising any exceptions
and instead return an error value to the caller. But it's now required
that a call to the parser be wrapped in an nlr handler, so we may as well
make use of that fact and simplify the parser so that it doesn't need to
keep track of any memory errors that it had. The parser anyway explicitly
raises an exception at the end if there was an error.
This patch simplifies the parser by letting the underlying memory
allocation functions raise an exception if they fail to allocate any
memory. And if there is an error parsing the "<id> = const(<val>)" pattern
then that also raises an exception right away instead of trying to recover
gracefully and then raise.
Previous to this patch any non-interned str/bytes objects would create a
special parse node that held a copy of the str/bytes data. Then in the
compiler this data would be turned into a str/bytes object. This actually
lead to 2 copies of the data, one in the parse node and one in the object.
The parse node's copy of the data would be freed at the end of the compile
stage but nevertheless it meant that the peak memory usage of the
parse/compile stage was higher than it needed to be (by an amount equal to
the number of bytes in all the non-interned str/bytes objects).
This patch changes the behaviour so that str/bytes objects are created
directly in the parser and the object stored in a const-object parse node
(which already exists for bignum, float and complex const objects). This
reduces peak RAM usage of the parse/compile stage, simplifies the parser
and compiler, and reduces code size by about 170 bytes on Thumb2 archs,
and by about 300 bytes on Xtensa archs.
This patch allows uPy consts to be bignums, eg:
X = const(1 << 100)
The infrastructure for consts to be a bignum (rather than restricted to
small integers) has been in place for a while, ever since constant folding
was upgraded to allow bignums. It just required a small change (in this
patch) to enable it.
It's configured by MICROPY_PY_UERRNO_ERRORCODE and enabled by default
(since that's the behaviour before this patch).
Without this dict the lookup of errno codes to strings must use the
uerrno module itself.
It's much more efficient in RAM and code size to do implicit literal string
concatenation in the lexer, as opposed to the compiler.
RAM usage is reduced because the concatenation can be done right away in the
tokeniser by just accumulating the string/bytes literals into the lexer's
vstr. Prior to this patch adjacent strings/bytes would create a parse tree
(one node per string/bytes) and then in the compiler a whole new chunk of
memory was allocated to store the concatenated string, which used more than
double the memory compared to just accumulating in the lexer.
This patch also significantly reduces code size:
bare-arm: -204
minimal: -204
unix x64: -328
stmhal: -208
esp8266: -284
cc3200: -224
Previous to this patch there was an explicit check for errors with line
continuation (where backslash was not immediately followed by a newline).
But this check is not necessary: if there is an error then the remaining
logic of the tokeniser will reject the backslash and correctly produce a
syntax error.
Since the table of keywords is sorted, we can use strcmp to do the search
and stop part way through the search if the comparison is less-than.
Because all tokens that are names are subject to this search, this
optimisation will improve the overall speed of the lexer when processing
a script.
The change also decreases code size by a little bit because we now use
strcmp instead of the custom str_strn_equal function.
Keywords only needs to be searched for if the token is a MP_TOKEN_NAME, so
we can move the seach to the part of the code that does the tokenising for
MP_TOKEN_NAME.
Grammar rules have 2 variants: ones that are attached to a specific
compile function which is called to compile that grammar node, and ones
that don't have a compile function and are instead just inspected to see
what form they take.
In the compiler there is a table of all grammar rules, with each entry
having a pointer to the associated compile function. Those rules with no
compile function have a null pointer. There are 120 such rules, so that's
120 words of essentially wasted code space.
By grouping together the compile vs no-compile rules we can put all the
no-compile rules at the end of the list of rules, and then we don't need
to store the null pointers. We just have a truncated table and it's
guaranteed that when indexing this table we only index the first half,
the half with populated pointers.
This patch implements such a grouping by having a specific macro for the
compile vs no-compile grammar rules (DEF_RULE vs DEF_RULE_NC). It saves
around 460 bytes of code on 32-bit archs.
Allows to iterate over the following without allocating on the heap:
- tuple
- list
- string, bytes
- bytearray, array
- dict (not dict.keys, dict.values, dict.items)
- set, frozenset
Allows to call the following without heap memory:
- all, any, min, max, sum
TODO: still need to allocate stack memory in bytecode for iter_buf.