This patch is a code optimisation, trading text bytes for speed. On
pyboard it's an increase of 0.06% in code size for a gain (in pystone
performance) of roughly 6.5%.
The patch optimises load/store/delete of attributes in user defined classes
by not looking up special accessors (@property, __get__, __delete__,
__set__, __setattr__ and __getattr_) if they are guaranteed not to exist in
the class.
Currently, if you do my_obj.foo() then the runtime has to do a few checks
to see if foo is a property or has __get__, and if so delegate the call.
And for stores things like my_obj.foo = 1 has to first check if foo is a
property or has __set__ defined on it.
Doing all those checks each and every time the attribute is accessed has a
performance penalty. This patch eliminates all those checks for cases when
it's guaranteed that the checks will always fail, ie no attributes are
properties nor have any special accessor methods defined on them.
To make this guarantee it checks all attributes of a user-defined class
when it is first created. If any of the attributes of the user class are
properties or have special accessors, or any of the base classes of the
user class have them, then it sets a flag in the class to indicate that
special accessors must be checked for. Then in the load/store/delete code
it checks this flag to see if it can take the shortcut and optimise the
lookup.
It's an optimisation that's pretty widely applicable because it improves
lookup performance for all methods of user defined classes, and stores of
attributes, at least for those that don't have special accessors. And, it
allows to enable descriptors with minimal additional runtime overhead if
they are not used for a particular user class.
There is one restriction on dynamic class creation that has been introduced
by this patch: a user-defined class cannot go from zero special accessors
to one special accessor (or more) after that class has been subclassed. If
the script attempts this an AttributeError is raised (see addition to
tests/misc/non_compliant.py for an example of this case).
The cost in code space bytes for the optimisation in this patch is:
unix x64: +528
unix nanbox: +508
stm32: +192
cc3200: +200
esp8266: +332
esp32: +244
Performance tests that were done:
- on unix x86-64, pystone improved by about 5%
- on pyboard, pystone improved by about 6.5%, from 1683 up to 1794
- on pyboard, bm_chaos (from CPython benchmark suite) improved by about 5%
- on esp32, pystone improved by about 30% (but there are caching effects)
- on esp32, bm_chaos improved by about 11%
Keeping all the stress related tests in one place makes it easier to
stress-test a given port, and to also not run such tests on ports that
can't handle them.
CPython docs explicitly state that the RHS of a set/frozenset binary op
must be a set to prevent user errors. It also preserves commutativity of
the ops, eg: "abc" & set() is a TypeError, and so should be set() & "abc".
This change actually decreases unix (x64) code by 160 bytes; it increases
stm32 by 4 bytes and esp8266 by 28 bytes (but previous patch already
introduced a much large saving).
Previously, "import _io" worked on both CPython and MicroPython (essentially
by a chance on CPython, as there's not guarantee that its contents will stay
the same across versions), but as the module was renamed to uio, need to use
more robust import sequence for compatibility.
NameError may either include offending name or not. Unfortunately, this
change makes test float-dependent. And using integer division leads to
different error message than CPython.
For these 3 bitwise operations there are now fast functions for
positive-only arguments, and general functions for arbitrary sign
arguments (the fast functions are the existing implementation).
By default the fast functions are not used (to save space) and instead
the general functions are used for all operations.
Enable MICROPY_OPT_MPZ_BITWISE to use the fast functions for positive
arguments.
Constant folding in the parser can now operate on big ints, whatever
their representation. This is now possible because the parser can create
parse nodes holding arbitrary objects. For the case of small ints the
folding is still efficient in RAM because the folded small int is stored
inplace in the parse node.
Adds 48 bytes to code size on Thumb2 architecture. Helps reduce heap
usage because more constants can be computed at compile time, leading to
a smaller parse tree, and most importantly means that the constants don't
have to be computed at runtime (perhaps more than once). Parser will now
be a little slower when folding due to calls to runtime to do the
arithmetic.
This doesn't handle case fo enclosed except blocks, but once again,
sys.exc_info() support is a workaround for software which uses it
instead of properly catching exceptions via variable in except clause.
The implementation is very basic and non-compliant and provided solely for
CPython compatibility. The function itself is bad Python2 heritage, its
usage is discouraged.
This makes exception traceback info self contained (ie doesn't rely on
list object, which was a bit of a hack), reduces code size, and reduces
RAM footprint of exception by eliminating the list object.
Addresses part of issue #1126.
Issue was with uPy: on local machine with micropython-lib installed, io
module is available. Not the case on Travis CI, where only _io module
is available in uPy.
Don't store final, failing value to the loop variable. This fix also
makes for .. range a bit more efficient, as it uses less store/load
pairs for the loop variable.