This turns failing assertions to type exceptions for things like
b"123".find(...). We still don't support operations like this on bytes
objects (unlike CPython), but at least it no longer crashes.
Eg b"123" + bytearray(2) now works. This patch actually decreases code
size while adding functionality: 32-bit unix down by 128 bytes, stmhal
down by 84 bytes.
Uninitialised struct members get a default value of 0/false, so this is
not strictly needed. But it actually decreases code size because when
all members are initialised the compiler doesn't need to insert a call
to memset to clear everything. In other words, setting 1 extra member
to 0 uses less code than calling memset.
ROM savings in bytes: 32-bit unix: 100; bare-arm: 44; stmhal: 52.
gc.enable/disable are now the same as CPython: they just control whether
automatic garbage collection is enabled or not. If disabled, you can
still allocate heap memory, and initiate a manual collection.
msvc does not treat 1L a 64bit integer hence all occurences of shifting it left or right
result in undefined behaviour since the maximum allowed shift count for 32bit ints is 31.
Forcing the correct type explicitely, stored in MPZ_LONG_1, solves this.
It should be fair to say that almost in all cases where some API call
expects string, it should be also possible to pass byte string. For example,
it should be open/delete/rename file with name as bytestring. Note that
similar change was done quite a long ago to mp_obj_str_get_data().
Support for packages as argument not implemented, but otherwise error and
exit handling should be correct. This for example will allow to do:
pip-micropython install micropython-test.pystone
micropython -m test.pystone
This allows to implement KeyboardInterrupt on unix, and a much safer
ctrl-C in stmhal port. First ctrl-C is a soft one, with hope that VM
will notice it; second ctrl-C is a hard one that kills anything (for
both unix and stmhal).
One needs to check for a pending exception in the VM only for jump
opcodes. Others can't produce an infinite loop (infinite recursion is
caught by stack check).
There is a lot potential in compress bytecodes and make more use of the
coding space. This patch introduces "multi" bytecodes which have their
argument included in the bytecode (by addition).
UNARY_OP and BINARY_OP now no longer take a 1 byte argument for the
opcode. Rather, the opcode is included in the first byte itself.
LOAD_FAST_[0,1,2] and STORE_FAST_[0,1,2] are removed in favour of their
multi versions, which can take an argument between 0 and 15 inclusive.
The majority of LOAD_FAST/STORE_FAST codes fit in this range and so this
saves a byte for each of these.
LOAD_CONST_SMALL_INT_MULTI is used to load small ints between -16 and 47
inclusive. Such ints are quite common and now only need 1 byte to
store, and now have much faster decoding.
In all this patch saves about 2% RAM for typically bytecode (1.8% on
64-bit test, 2.5% on pyboard test). It also reduces the binary size
(because bytecodes are simplified) and doesn't harm performance.
This saves a lot of RAM for 2 reasons:
1. For functions that don't have default values, var args or var kw
args (which is a large number of functions in the general case), the
mp_obj_fun_bc_t type now fits in 1 GC block (previously needed 2 because
of the extra pointer to point to the arg_names array). So this saves 16
bytes per function (32 bytes on 64-bit machines).
2. Combining separate memory regions generally saves RAM because the
unused bytes at the end of the GC block are saved for 1 of the blocks
(since that block doesn't exist on its own anymore). So generally this
saves 8 bytes per function.
Tested by importing lots of modules:
- 64-bit Linux gave about an 8% RAM saving for 86k of used RAM.
- pyboard gave about a 6% RAM saving for 31k of used RAM.
This makes open() and _io.FileIO() more CPython compliant.
The mode kwarg is fully iplemented.
The encoding kwarg is allowed but not implemented; mainly to allow
the tests to specify encoding for CPython, see #874
Also, usocket.readinto(). Known issue is that .readinto() should be available
only for binary files, but micropython uses single method table for both
binary and text files.
Just like they handled in other read*(). Note that behavior of readline()
in case there's no data when it's called is underspecified in Python lib
spec, implemented to behave as read() - return None.
With this patch a port can enable module weak link support and provide
a dict of qstr->module mapping. This mapping is looked up only if an
import fails to find the requested module in the filesystem.
This allows to have the builtin module named, eg, usocket, and provide
a weak link of "socket" to the same module, but this weak link can be
overridden if a file by the name "socket.py" is found in the import
path.
This has benefits all round: code factoring for parse/compile/execute,
proper context save/restore for exec, allow to sepcify globals/locals
for eval, and reduced ROM usage by >100 bytes on stmhal and unix.
Also, the call to mp_parse_compile_execute is tail call optimised for
the import code, so it doesn't increase stack memory usage.
In CPython IOError (and EnvironmentError) is deprecated and aliased to
OSError. All modules that used to raise IOError now raise OSError (or a
derived exception).
In Micro Python we never used IOError (except 1 place, incorrectly) and
so don't need to keep it.
See http://legacy.python.org/dev/peps/pep-3151/ for background.
Viper can now do the following:
def store(p:ptr8, c:int):
p[0] = c
This does a store of c to the memory pointed to by p using a machine
instructions inline in the code.
It seems most sensible to use size_t for measuring "number of bytes" in
malloc and vstr functions (since that's what size_t is for). We don't
use mp_uint_t because malloc and vstr are not Micro Python specific.
mp_parse_node_free now frees the memory associated with non-interned
strings. And the parser calls mp_parse_node_free when discarding a
non-used node (such as a doc string).
Also, the compiler now frees the parse tree explicitly just before it
exits (as opposed to relying on the caller to do this).
Addresses issue #708 as best we can.
Stack is full descending and must be 8-byte aligned. It must start off
pointing to just above the last byte of RAM.
Previously, stack started pointed to last byte of RAM (eg 0x2001ffff)
and so was not 8-byte aligned. This caused a bug in combination with
alloca.
This patch also updates some debug printing code.
Addresses issue #872 (among many other undiscovered issues).
Heap RAM was being allocated to print dicts and do some other types of
iterating. Now these iterations use 1 word of state on the stack.
Deleting elements from a dict was not allowing the value to be reclaimed
by the GC. This is now fixed.
sys.exit always raises SystemExit so doesn't need a special
implementation for each port. If C exit() is really needed, use the
standard os._exit function.
Also initialise mp_sys_path and mp_sys_argv in teensy port.
Eventually, viper wants to be able to use raw pointers to strings and
arrays for efficient access. But for now, let's just load strings as a
Python object so they can be used as normal. This will anyway be
compatible with eventual intended viper behaviour.
Addresses issue #857.
Type representing signed size doesn't have to be int, so use special value
which defaults to SSIZE_MAX, but as it's not defined by C standard (but rather
by POSIX), allow ports to set it.
Previously, mpz was restricted to using at most 15 bits in each digit,
where a digit was a uint16_t.
With this patch, mpz can use all 16 bits in the uint16_t (improvement
to mpn_div was required). This gives small inprovements in speed and
RAM usage. It also yields savings in ROM code size because all of the
digit masking operations become no-ops.
Also, mpz can now use a uint32_t as the digit type, and hence use 32
bits per digit. This will give decent improvements in mpz speed on
64-bit machines.
Test for big integer division added.
Code-info size, block name, source name, n_state and n_exc_stack now use
variable length encoded uints. This saves 7-9 bytes per bytecode
function for most functions.
This way, the native glue code is only compiled if native code is
enabled (which makes complete sense; thanks to Paul Sokolovsky for
the idea).
Should fix issue #834.
The heap allocation is now exactly as it was before the "faster gc
alloc" patch, but it's still nearly as fast. It is fixed by being
careful to always update the "last free block" pointer whenever the heap
changes (eg free or realloc).
Tested on all tests by enabling EXTENSIVE_HEAP_PROFILING in py/gc.c:
old and new allocator have exactly the same behaviour, just the new one
is much faster.
Recent speed up of GC allocation made the GC have a fragmented heap.
This patch restores "original fragmentation behaviour" whilst still
retaining relatively fast allocation. This patch works because there is
always going to be a single block allocated now and then, which advances
the gc_last_free_atb_index pointer often enough so that the whole heap
doesn't need scanning.
Should address issue #836.
With a file with 1 line (and an error on that line), used to show the
line as number 0. Now shows it correctly as line number 1.
But, when line numbers are disabled, it now prints line number 1 for any
line that has an error (instead of 0 as previously). This might end up
being confusing, but requires extra RAM and/or hack logic to make it
print something special in the case of no line numbers.
These functions are generally 1 machine instruction, and are used in
critical code, so makes sense to have them inline.
Also leave these functions uninverted (ie 0 means enable, 1 means
disable) and provide macro constants if you really need to distinguish
the states. This makes for smaller code as well (combined with
inlining).
Applied to teensy port as well.
Because (for Thumb) a function pointer has the LSB set, pointers to
dynamic functions in RAM (eg native, viper or asm functions) were not
being traced by the GC. This patch is a comprehensive fix for this.
Addresses issue #820.
This simple patch gives a very significant speed up for memory allocation
with the GC.
Eg, on PYBv1.0:
tests/basics/dict_del.py: 3.55 seconds -> 1.19 seconds
tests/misc/rge_sm.py: 15.3 seconds -> 2.48 seconds
Multiplication of a tuple, list, str or bytes now yields an empty
sequence (instead of crashing). Addresses issue #799
Also added ability to mult bytes on LHS by integer.
Can now index ranges with integers and slices, and reverse ranges
(although reversing is not very efficient).
Not sure how useful this stuff is, but gets us closer to having all of
Python's builtins.
reversed function now implemented, and works for tuple, list, str, bytes
and user objects with __len__ and __getitem__.
Renamed mp_builtin_len to mp_obj_len to make it publically available (eg
for reversed).
This happens for example for zero-size arrays. As .get_buffer() method now
has explicit return value, it's enough to distinguish success vs failure
of getting buffer.
This was a nasty bug to track down. It only had consequences when the
heap size was just the right size to expose the rounding error in the
calculation of the finaliser table size. And, a script had to allocate
a small (1 or 2 cell) object at the very end of the heap. And, this
object must not have a finaliser. And, the initial state of the heap
must have been all bits set to 1. All these conspire on the pyboard,
but only if your run the script fresh (so unused memory is all 1's),
and if your script allocates a lot of small objects (eg 2-char strings
that are not interned).
qstr_init is always called exactly before mp_init, so makes sense to
just have mp_init call it. Similarly with
mp_init_emergency_exception_buf. Doing this makes the ports simpler and
less error prone (ie they can no longer forget to call these).
Reduces by about a factor of 10 on average the amount of RAM needed to
store the line-number to bytecode map in the bytecode prelude.
Using CPython3.4's stdlib for statistics: previously, an average of
13 bytes were used per (bytecode offset, line-number offset) pair, and
now with this improvement, that's down to 1.3 bytes on average.
Large RAM usage before was due to some very large steps in line numbers,
both from the start of the first line in a function way down in the
file, and also functions that have big comments and/or big strings in
them (both cases were significant).
Although the savings are large on average for the CPython stdlib, it
won't have such a big effect for small scripts used in embedded
programming.
Addresses issue #648.
This removes mpz_as_int, since that was a terrible function (it
implemented saturating conversion).
Use mpz_as_int_checked and mpz_as_uint_checked. These now work
correctly (they previously had wrong overflow checking, eg
print(chr(10000000000000)) on 32-bit machine would incorrectly convert
this large number to a small int).
Many OSes/CPUs have affinity to put "user" data into lower half of address
space. Take advantage of that and remap such addresses into full small int
range (including negative part).
If address is from upper half, long int will be used. Previously, small
int was returned for lower quarter of address space, and upper quarter. For
2 middle quarters, long int was used, which is clearly worse schedule than
the above.
The user code should call micropython.alloc_emergency_exception_buf(size)
where size is the size of the buffer used to print the argument
passed to the exception.
With the test code from #732, and a call to
micropython.alloc_emergenncy_exception_buf(100) the following error is
now printed:
```python
>>> import heartbeat_irq
Uncaught exception in Timer(4) interrupt handler
Traceback (most recent call last):
File "0://heartbeat_irq.py", line 14, in heartbeat_cb
NameError: name 'led' is not defined
```
With unicode enabled, this patch allows reading a fixed number of
characters from text-mode streams; eg file.read(5) will read 5 unicode
chars, which can made of more than 5 bytes.
For an ASCII stream (ie no chars > 127) it only needs to do 1 read. If
there are lots of non-ASCII chars in a stream, then it needs multiple
reads of the underlying object.
Adds a new test for this case. Enables unicode support by default on
unix and stmhal ports.
dummy_data field is accessed as uint value (e.g.
in emit_write_bytecode_byte_ptr), but is not aligned as such, which causes
bus errors or incorrect behavior on any arch requiring strictly aligned
data (ARM pre-v7, MIPS, etc, etc).
Conflicts:
stmhal/pin_named_pins.c
stmhal/readline.c
Renamed HAL_H to MICROPY_HAL_H. Made stmhal/mphal.h which intends to
define the generic Micro Python HAL, which in stmhal sits above the ST
HAL.
Native emitter can now compile try/except blocks using nlr_push/nlr_pop.
It probably only works for 1 level of exception handling. It doesn't
work on Thumb (only x64).
Native emitter can also handle some additional op codes.
With this patch, 198 tests now pass using "-X emit=native" option to
micropython.
- rearrange/add definitions that were not there so it's easier to compare both
- use MICROPY_PY_SYS_PLATFORM in main.c since it's available anyway
- define EWOULDBLOCK, it is missing from ingw32
As stack checking is enabled by default, ports which don't call
stack_ctrl_init() are broken now (report RuntimeError on startup). Save
them trouble and just init stack control framework in interpreter init.
Squashed commit of the following:
commit 99dc21b67a895dc10d3c846bc158d27c839cee48
Author: Chris Angelico <rosuav@gmail.com>
Date: Thu Jun 12 02:18:54 2014 +1000
Optimize as per TODO (thanks Damien!)
commit 5bf0153ecad8348443058d449d74504fc458fe51
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 08:42:06 2014 +1000
Test a default (= UTF-8) encode and decode
commit c962057ac340832c4fde60896f656a3fe3ad78a9
Merge: e2c9782 195de32
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 05:23:03 2014 +1000
Merge branch 'master' into unicode, resolving conflict on py/obj.h
commit e2c9782a65eb57f481d441d40161de427e1940ba
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 05:05:57 2014 +1000
More whitespace fixups
commit 086a2a0f57afbc1f731697fd5d3a0cbbb80e5418
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 05:04:20 2014 +1000
Properly implement string slicing
commit 0d339a143e2b6442366145e7f3d64aada293eaa0
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 02:24:11 2014 +1000
Support slicing in str_index_to_ptr, and fix a bounds error
commit 24371c7267d360e77cf5eabc2e8ce9a73d2ee0da
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 02:10:22 2014 +1000
Break out index-to-pointer calculation into a function
commit 616c24ac014c3ca56008428c506034dd1bfff7a8
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 02:03:11 2014 +1000
Add tests of string slicing, which currently fail
commit a24d19f676fe8cc21dad512d91b826892e162a5b
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 01:56:53 2014 +1000
Change string indexing to not precalculate the charlen, and add test for neg indexing
commit 0bcc7ab89eafb2ae53195e94c9bea42a4e886b64
Author: Chris Angelico <rosuav@gmail.com>
Date: Sun Jun 8 22:09:17 2014 +1000
Clean up constant qstr declarations now that charlen isn't needed
commit 5473e1a1dba2124b7b0c207f2964293cfbe80167
Author: Chris Angelico <rosuav@gmail.com>
Date: Sun Jun 8 07:18:42 2014 +1000
Remove the charlen field from strings, calculating it when required
commit 5c1658ec71aefbdc88c261ce2e57dc7670cdc6ef
Author: Chris Angelico <rosuav@gmail.com>
Date: Sun Jun 8 07:11:27 2014 +1000
Get rid of mp_obj_str_get_data_len() which was used in only one place
commit a019ba968b4e8daf7f3674f63c5cc400e304c509
Author: Chris Angelico <rosuav@gmail.com>
Date: Sun Jun 8 06:58:26 2014 +1000
Add a unichar_charlen() function to calculate length-in-characters from length-in-bytes
commit 44b0d5cff846ba487c526ed95be1b3d1cd3d762a
Author: Chris Angelico <rosuav@gmail.com>
Date: Sun Jun 8 06:32:44 2014 +1000
Use utf8_get/next_char in building up a string's repr
commit 30d1bad33f7af90f1971987c39864c8fcf3f5c21
Author: Chris Angelico <rosuav@gmail.com>
Date: Sun Jun 8 06:10:45 2014 +1000
Make utf8_get_char() and utf8_next_char() actually do what their names say
commit bc990dad9afb8ec112f5e7f7f79d5ab415da0e72
Author: Chris Angelico <rosuav@gmail.com>
Date: Sun Jun 8 02:10:59 2014 +1000
Revert "Add PEP 393-flags to strings and stub usage."
This reverts commit c239f509521d1a0f9563bf9c5de0c4fb9a6a33ba.
commit f9bebb28ad52467f2f2d7a752bb033296b6c2f9b
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 15:41:48 2014 +1000
Whitespace fixes
commit 279de0c8eb3cb186914799ccc5ee94ea97f56de4
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 15:28:35 2014 +1000
Formatting/layout improvements - introduce macros for UTF-8 byte detection, add braces. No functional changes.
commit f1911f53d56da809c97b07245f5728a419e8fb30
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 11:56:02 2014 +1000
Make chr() Unicode-aware
commit f51ad737b48ac04c161197a4012821d50885c4c7
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 11:44:07 2014 +1000
Make a string's repr Unicode-aware
commit 01bd68684611585d437982dccdf05b33cbedc630
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 11:33:43 2014 +1000
Expand the Unicode tests
commit 7bc91904f899f8012089fc14a06495680a51e590
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 11:27:30 2014 +1000
Record byte lengths for byte strings
commit bb132120717cf176dcfb26f87fa309378f76ab5f
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 11:25:06 2014 +1000
Make ord() Unicode-aware
commit 03f0cbe9051b62192be97b59f84f63f9216668bf
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 10:24:35 2014 +1000
Retain characters as UTF-8 encoded Unicode
commit e924659b85c001916a5ff7f4d1d8b3ebe2bf0c2f
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 08:37:27 2014 +1000
Add support for \u and \U escapes, but not \N (with explanatory comment)
commit 231031ac5f0346e4ffcf9c4abec2bd33f566232c
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 05:09:35 2014 +1000
Add character length to qstr
commit 6df1b946fb17d8d5df3d91b21cde627c3d4556a8
Author: Chris Angelico <rosuav@gmail.com>
Date: Fri Jun 6 13:48:36 2014 +1000
Add test of UTF-8 encoded source file resulting in properly formed string
commit 16429b81a8483cf25865ed11afd81a7d9c253c26
Author: Chris Angelico <rosuav@gmail.com>
Date: Fri Jun 6 13:44:15 2014 +1000
Make len(s) return character length (even though creation's still buggy)
commit cd2cf6663cc47831dbc97819ad5c50ad33f939d3
Author: Chris Angelico <rosuav@gmail.com>
Date: Fri Jun 6 13:15:36 2014 +1000
HACK - When indexing a qstr, count its charlen. Stupidly inefficient but POC.
All tests pass now, though string creation is still buggy.
commit 47c234584d3358dfa6b4003d5e7264105d17b8f7
Author: Chris Angelico <rosuav@gmail.com>
Date: Fri Jun 6 13:15:32 2014 +1000
objstr: Record character length separately from byte length
CAUTION: Buggy, may crash stuff - qstr needs equivalent functionality too
commit b0f41c72af27d3b361027146025877b3d7e8785c
Author: Chris Angelico <rosuav@gmail.com>
Date: Fri Jun 6 05:37:36 2014 +1000
Beginnings of UTF-8 support - construct strings from that many UTF-8-encoded chars, and subscript bytes the same way
commit 89452be641674601e9bfce86dc71c17c3140a6cf
Author: Chris Angelico <rosuav@gmail.com>
Date: Fri Jun 6 05:28:47 2014 +1000
Update comments - now aiming for UTF-8 rather than PEP 393 strings
commit c239f509521d1a0f9563bf9c5de0c4fb9a6a33ba
Author: Chris Angelico <rosuav@gmail.com>
Date: Wed Jun 4 05:28:12 2014 +1000
Add PEP 393-flags to strings and stub usage.
The test suite all passes, but nothing has actually been changed.
Such mechanism is important to get stable Python functioning, because Python
function calling is handled with C stack. The idea is to sprinkle
STACK_CHECK() calls in places where there can be C recursion.
TODO: Add more STACK_CHECK()'s.
Expected to be set on command line, with the idea being that for different
targets, there're different smartass ABIs which strive to put unneeded
sections into executables, etc., so let people have flexible way to
strip that.
The option name is similar to previously introduced CLFAGS_EXTRA &
LDFLAGS_EXTRA.
char can be signedness, and using signedness types is dangerous - it can
lead to negative offsets when doing table lookups. We apparently should just
ban char usage.
This will allow roughly the same behavior as Python3 for non-ASCII strings,
for example, print("<phrase in non-Latin script>".split()) will print list
of words, not weird hex dump (like Python2 behaves). (Of course, that it
will print list of words, if there're "words" in that phrase at all, separated
by ASCII-compatible whitespace; that surely won't apply to every human
language in existence).
Functionality we provide in builtin io module is fairly minimal. Some
code, including CPython stdlib, depends on more functionality. So, there's
a choice to either implement it in C, or move it _io, and let implement other
functionality in Python. 2nd choice is pursued. This setup matches CPython
too (_io is builtin, io is Python-level).
Benefits: won't crash baremetal targets, will provide Python source location
when not implemented feature used (it will no longer provide C source
location, but just grep for error message).