Only types whose iterator instances still fit in 4 machine words have
been changed to use the polymorphic iterator.
Reduces Thumb2 arch code size by 264 bytes.
This allows the mp_obj_t type to be configured to something other than a
pointer-sized primitive type.
This patch also includes additional changes to allow the code to compile
when sizeof(mp_uint_t) != sizeof(void*), such as using size_t instead of
mp_uint_t, and various casts.
Previous to this patch the printing mechanism was a bit of a tangled
mess. This patch attempts to consolidate printing into one interface.
All (non-debug) printing now uses the mp_print* family of functions,
mainly mp_printf. All these functions take an mp_print_t structure as
their first argument, and this structure defines the printing backend
through the "print_strn" function of said structure.
Printing from the uPy core can reach the platform-defined print code via
two paths: either through mp_sys_stdout_obj (defined pert port) in
conjunction with mp_stream_write; or through the mp_plat_print structure
which uses the MP_PLAT_PRINT_STRN macro to define how string are printed
on the platform. The former is only used when MICROPY_PY_IO is defined.
With this new scheme printing is generally more efficient (less layers
to go through, less arguments to pass), and, given an mp_print_t*
structure, one can call mp_print_str for efficiency instead of
mp_printf("%s", ...). Code size is also reduced by around 200 bytes on
Thumb2 archs.
splitlines() occurs ~179 times in CPython3 standard library, so was
deemed worthy to implement. The method has subtle semantic differences
from just .split("\n"). It is also defined as working for any end-of-line
combination, but this is currently not implemented - it works only with
LF line-endings (which should be OK for text strings on any platforms,
but not OK for bytes).
There was really weird warning (promoted to error) when building Windows
port. Exact cause is still unknown, but it uncovered another issue:
8-bit and unicode str_make_new implementations should be mutually exclusive,
and not built at the same time. What we had is that bytes_decode() pulled
8-bit str_make_new() even for unicode build.
This patch allows to reuse vstr memory when creating str/bytes object.
This improves memory usage.
Also saves code ROM: 128 bytes on stmhal, 92 bytes on bare-arm, and 88
bytes on unix x64.
Squashed commit of the following:
commit 99dc21b67a895dc10d3c846bc158d27c839cee48
Author: Chris Angelico <rosuav@gmail.com>
Date: Thu Jun 12 02:18:54 2014 +1000
Optimize as per TODO (thanks Damien!)
commit 5bf0153ecad8348443058d449d74504fc458fe51
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 08:42:06 2014 +1000
Test a default (= UTF-8) encode and decode
commit c962057ac340832c4fde60896f656a3fe3ad78a9
Merge: e2c9782 195de32
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 05:23:03 2014 +1000
Merge branch 'master' into unicode, resolving conflict on py/obj.h
commit e2c9782a65eb57f481d441d40161de427e1940ba
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 05:05:57 2014 +1000
More whitespace fixups
commit 086a2a0f57afbc1f731697fd5d3a0cbbb80e5418
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 05:04:20 2014 +1000
Properly implement string slicing
commit 0d339a143e2b6442366145e7f3d64aada293eaa0
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 02:24:11 2014 +1000
Support slicing in str_index_to_ptr, and fix a bounds error
commit 24371c7267d360e77cf5eabc2e8ce9a73d2ee0da
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 02:10:22 2014 +1000
Break out index-to-pointer calculation into a function
commit 616c24ac014c3ca56008428c506034dd1bfff7a8
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 02:03:11 2014 +1000
Add tests of string slicing, which currently fail
commit a24d19f676fe8cc21dad512d91b826892e162a5b
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 01:56:53 2014 +1000
Change string indexing to not precalculate the charlen, and add test for neg indexing
commit 0bcc7ab89eafb2ae53195e94c9bea42a4e886b64
Author: Chris Angelico <rosuav@gmail.com>
Date: Sun Jun 8 22:09:17 2014 +1000
Clean up constant qstr declarations now that charlen isn't needed
commit 5473e1a1dba2124b7b0c207f2964293cfbe80167
Author: Chris Angelico <rosuav@gmail.com>
Date: Sun Jun 8 07:18:42 2014 +1000
Remove the charlen field from strings, calculating it when required
commit 5c1658ec71aefbdc88c261ce2e57dc7670cdc6ef
Author: Chris Angelico <rosuav@gmail.com>
Date: Sun Jun 8 07:11:27 2014 +1000
Get rid of mp_obj_str_get_data_len() which was used in only one place
commit a019ba968b4e8daf7f3674f63c5cc400e304c509
Author: Chris Angelico <rosuav@gmail.com>
Date: Sun Jun 8 06:58:26 2014 +1000
Add a unichar_charlen() function to calculate length-in-characters from length-in-bytes
commit 44b0d5cff846ba487c526ed95be1b3d1cd3d762a
Author: Chris Angelico <rosuav@gmail.com>
Date: Sun Jun 8 06:32:44 2014 +1000
Use utf8_get/next_char in building up a string's repr
commit 30d1bad33f7af90f1971987c39864c8fcf3f5c21
Author: Chris Angelico <rosuav@gmail.com>
Date: Sun Jun 8 06:10:45 2014 +1000
Make utf8_get_char() and utf8_next_char() actually do what their names say
commit bc990dad9afb8ec112f5e7f7f79d5ab415da0e72
Author: Chris Angelico <rosuav@gmail.com>
Date: Sun Jun 8 02:10:59 2014 +1000
Revert "Add PEP 393-flags to strings and stub usage."
This reverts commit c239f509521d1a0f9563bf9c5de0c4fb9a6a33ba.
commit f9bebb28ad52467f2f2d7a752bb033296b6c2f9b
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 15:41:48 2014 +1000
Whitespace fixes
commit 279de0c8eb3cb186914799ccc5ee94ea97f56de4
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 15:28:35 2014 +1000
Formatting/layout improvements - introduce macros for UTF-8 byte detection, add braces. No functional changes.
commit f1911f53d56da809c97b07245f5728a419e8fb30
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 11:56:02 2014 +1000
Make chr() Unicode-aware
commit f51ad737b48ac04c161197a4012821d50885c4c7
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 11:44:07 2014 +1000
Make a string's repr Unicode-aware
commit 01bd68684611585d437982dccdf05b33cbedc630
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 11:33:43 2014 +1000
Expand the Unicode tests
commit 7bc91904f899f8012089fc14a06495680a51e590
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 11:27:30 2014 +1000
Record byte lengths for byte strings
commit bb132120717cf176dcfb26f87fa309378f76ab5f
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 11:25:06 2014 +1000
Make ord() Unicode-aware
commit 03f0cbe9051b62192be97b59f84f63f9216668bf
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 10:24:35 2014 +1000
Retain characters as UTF-8 encoded Unicode
commit e924659b85c001916a5ff7f4d1d8b3ebe2bf0c2f
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 08:37:27 2014 +1000
Add support for \u and \U escapes, but not \N (with explanatory comment)
commit 231031ac5f0346e4ffcf9c4abec2bd33f566232c
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 05:09:35 2014 +1000
Add character length to qstr
commit 6df1b946fb17d8d5df3d91b21cde627c3d4556a8
Author: Chris Angelico <rosuav@gmail.com>
Date: Fri Jun 6 13:48:36 2014 +1000
Add test of UTF-8 encoded source file resulting in properly formed string
commit 16429b81a8483cf25865ed11afd81a7d9c253c26
Author: Chris Angelico <rosuav@gmail.com>
Date: Fri Jun 6 13:44:15 2014 +1000
Make len(s) return character length (even though creation's still buggy)
commit cd2cf6663cc47831dbc97819ad5c50ad33f939d3
Author: Chris Angelico <rosuav@gmail.com>
Date: Fri Jun 6 13:15:36 2014 +1000
HACK - When indexing a qstr, count its charlen. Stupidly inefficient but POC.
All tests pass now, though string creation is still buggy.
commit 47c234584d3358dfa6b4003d5e7264105d17b8f7
Author: Chris Angelico <rosuav@gmail.com>
Date: Fri Jun 6 13:15:32 2014 +1000
objstr: Record character length separately from byte length
CAUTION: Buggy, may crash stuff - qstr needs equivalent functionality too
commit b0f41c72af27d3b361027146025877b3d7e8785c
Author: Chris Angelico <rosuav@gmail.com>
Date: Fri Jun 6 05:37:36 2014 +1000
Beginnings of UTF-8 support - construct strings from that many UTF-8-encoded chars, and subscript bytes the same way
commit 89452be641674601e9bfce86dc71c17c3140a6cf
Author: Chris Angelico <rosuav@gmail.com>
Date: Fri Jun 6 05:28:47 2014 +1000
Update comments - now aiming for UTF-8 rather than PEP 393 strings
commit c239f509521d1a0f9563bf9c5de0c4fb9a6a33ba
Author: Chris Angelico <rosuav@gmail.com>
Date: Wed Jun 4 05:28:12 2014 +1000
Add PEP 393-flags to strings and stub usage.
The test suite all passes, but nothing has actually been changed.