1413 Commits

Author SHA1 Message Date
Paul Sokolovsky
f5f6c3b792 streams: Reading by char count from unicode text streams is not implemented. 2014-06-27 00:04:20 +03:00
Paul Sokolovsky
ce81312d8a misc: Add count_lead_ones() function, useful for UTF-8 handling. 2014-06-27 00:04:20 +03:00
Paul Sokolovsky
ea2c936c7e objstrunicode: Refactor str_index_to_ptr() following objstr. 2014-06-27 00:04:20 +03:00
Paul Sokolovsky
26fda6dc8e objstr: 64-bit issues. 2014-06-27 00:04:19 +03:00
Paul Sokolovsky
00c904b47a objstrunicode: Signedness issues. 2014-06-27 00:04:19 +03:00
Paul Sokolovsky
1044c3dfe6 unicode: Make get_char()/next_char()/charlen() be 8-bit compatible.
Based on config define.
2014-06-27 00:04:19 +03:00
Paul Sokolovsky
5048df0b7c objstr: find(), rfind(), index(): Make return value be unicode-aware. 2014-06-27 00:04:19 +03:00
Paul Sokolovsky
46d31e9ca9 unicode: Add utf8_ptr_to_index().
Useful when we have pointer to char inside string, but need to return char
index. (E.g. str.find()).
2014-06-27 00:04:19 +03:00
Paul Sokolovsky
ded0fc77f7 py: Add dedicated unicode header. 2014-06-27 00:04:19 +03:00
Paul Sokolovsky
79b7fe2ee5 objstrunicode: Implement iterator. 2014-06-27 00:04:19 +03:00
Paul Sokolovsky
cdc020da4b objstrunicode: Re-add buffer protocol back for now, required for io.StringIO. 2014-06-27 00:04:18 +03:00
Paul Sokolovsky
e7f2b4c875 objstrunicode: Revamp len() handling for unicode, and optimize bool(). 2014-06-27 00:04:18 +03:00
Paul Sokolovsky
86d3898e70 objstrunicode: Get rid of bytes checking, it's separate type. 2014-06-27 00:04:18 +03:00
Paul Sokolovsky
d215ee1dc1 py: Make MICROPY_PY_BUILTINS_STR_UNICODE=1 buildable. 2014-06-27 00:04:18 +03:00
Paul Sokolovsky
9731912ccb py: Prune unneeded code from objstrunicode, reuse code in objstr. 2014-06-27 00:04:18 +03:00
Paul Sokolovsky
165eb69b86 vstr: Restore bytestr compatibility. 2014-06-27 00:04:18 +03:00
Paul Sokolovsky
42a52516fe builtin: Restore bytestr compatibility. 2014-06-27 00:04:18 +03:00
Chris Angelico
2ba2299d28 lexer, vstr: Add unicode support. 2014-06-27 00:04:18 +03:00
Chris Angelico
9a1a4beb56 builtin: ord, chr: Unicode support. 2014-06-27 00:04:17 +03:00
Chris Angelico
64b468d873 objstrunicode: Basic implementation of unicode handling.
Squashed commit of the following:

commit 99dc21b67a895dc10d3c846bc158d27c839cee48
Author: Chris Angelico <rosuav@gmail.com>
Date:   Thu Jun 12 02:18:54 2014 +1000

    Optimize as per TODO (thanks Damien!)

commit 5bf0153ecad8348443058d449d74504fc458fe51
Author: Chris Angelico <rosuav@gmail.com>
Date:   Tue Jun 10 08:42:06 2014 +1000

    Test a default (= UTF-8) encode and decode

commit c962057ac340832c4fde60896f656a3fe3ad78a9
Merge: e2c9782 195de32
Author: Chris Angelico <rosuav@gmail.com>
Date:   Tue Jun 10 05:23:03 2014 +1000

    Merge branch 'master' into unicode, resolving conflict on py/obj.h

commit e2c9782a65eb57f481d441d40161de427e1940ba
Author: Chris Angelico <rosuav@gmail.com>
Date:   Tue Jun 10 05:05:57 2014 +1000

    More whitespace fixups

commit 086a2a0f57afbc1f731697fd5d3a0cbbb80e5418
Author: Chris Angelico <rosuav@gmail.com>
Date:   Tue Jun 10 05:04:20 2014 +1000

    Properly implement string slicing

commit 0d339a143e2b6442366145e7f3d64aada293eaa0
Author: Chris Angelico <rosuav@gmail.com>
Date:   Tue Jun 10 02:24:11 2014 +1000

    Support slicing in str_index_to_ptr, and fix a bounds error

commit 24371c7267d360e77cf5eabc2e8ce9a73d2ee0da
Author: Chris Angelico <rosuav@gmail.com>
Date:   Tue Jun 10 02:10:22 2014 +1000

    Break out index-to-pointer calculation into a function

commit 616c24ac014c3ca56008428c506034dd1bfff7a8
Author: Chris Angelico <rosuav@gmail.com>
Date:   Tue Jun 10 02:03:11 2014 +1000

    Add tests of string slicing, which currently fail

commit a24d19f676fe8cc21dad512d91b826892e162a5b
Author: Chris Angelico <rosuav@gmail.com>
Date:   Tue Jun 10 01:56:53 2014 +1000

    Change string indexing to not precalculate the charlen, and add test for neg indexing

commit 0bcc7ab89eafb2ae53195e94c9bea42a4e886b64
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sun Jun 8 22:09:17 2014 +1000

    Clean up constant qstr declarations now that charlen isn't needed

commit 5473e1a1dba2124b7b0c207f2964293cfbe80167
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sun Jun 8 07:18:42 2014 +1000

    Remove the charlen field from strings, calculating it when required

commit 5c1658ec71aefbdc88c261ce2e57dc7670cdc6ef
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sun Jun 8 07:11:27 2014 +1000

    Get rid of mp_obj_str_get_data_len() which was used in only one place

commit a019ba968b4e8daf7f3674f63c5cc400e304c509
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sun Jun 8 06:58:26 2014 +1000

    Add a unichar_charlen() function to calculate length-in-characters from length-in-bytes

commit 44b0d5cff846ba487c526ed95be1b3d1cd3d762a
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sun Jun 8 06:32:44 2014 +1000

    Use utf8_get/next_char in building up a string's repr

commit 30d1bad33f7af90f1971987c39864c8fcf3f5c21
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sun Jun 8 06:10:45 2014 +1000

    Make utf8_get_char() and utf8_next_char() actually do what their names say

commit bc990dad9afb8ec112f5e7f7f79d5ab415da0e72
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sun Jun 8 02:10:59 2014 +1000

    Revert "Add PEP 393-flags to strings and stub usage."

    This reverts commit c239f509521d1a0f9563bf9c5de0c4fb9a6a33ba.

commit f9bebb28ad52467f2f2d7a752bb033296b6c2f9b
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sat Jun 7 15:41:48 2014 +1000

    Whitespace fixes

commit 279de0c8eb3cb186914799ccc5ee94ea97f56de4
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sat Jun 7 15:28:35 2014 +1000

    Formatting/layout improvements - introduce macros for UTF-8 byte detection, add braces. No functional changes.

commit f1911f53d56da809c97b07245f5728a419e8fb30
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sat Jun 7 11:56:02 2014 +1000

    Make chr() Unicode-aware

commit f51ad737b48ac04c161197a4012821d50885c4c7
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sat Jun 7 11:44:07 2014 +1000

    Make a string's repr Unicode-aware

commit 01bd68684611585d437982dccdf05b33cbedc630
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sat Jun 7 11:33:43 2014 +1000

    Expand the Unicode tests

commit 7bc91904f899f8012089fc14a06495680a51e590
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sat Jun 7 11:27:30 2014 +1000

    Record byte lengths for byte strings

commit bb132120717cf176dcfb26f87fa309378f76ab5f
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sat Jun 7 11:25:06 2014 +1000

    Make ord() Unicode-aware

commit 03f0cbe9051b62192be97b59f84f63f9216668bf
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sat Jun 7 10:24:35 2014 +1000

    Retain characters as UTF-8 encoded Unicode

commit e924659b85c001916a5ff7f4d1d8b3ebe2bf0c2f
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sat Jun 7 08:37:27 2014 +1000

    Add support for \u and \U escapes, but not \N (with explanatory comment)

commit 231031ac5f0346e4ffcf9c4abec2bd33f566232c
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sat Jun 7 05:09:35 2014 +1000

    Add character length to qstr

commit 6df1b946fb17d8d5df3d91b21cde627c3d4556a8
Author: Chris Angelico <rosuav@gmail.com>
Date:   Fri Jun 6 13:48:36 2014 +1000

    Add test of UTF-8 encoded source file resulting in properly formed string

commit 16429b81a8483cf25865ed11afd81a7d9c253c26
Author: Chris Angelico <rosuav@gmail.com>
Date:   Fri Jun 6 13:44:15 2014 +1000

    Make len(s) return character length (even though creation's still buggy)

commit cd2cf6663cc47831dbc97819ad5c50ad33f939d3
Author: Chris Angelico <rosuav@gmail.com>
Date:   Fri Jun 6 13:15:36 2014 +1000

    HACK - When indexing a qstr, count its charlen. Stupidly inefficient but POC.

    All tests pass now, though string creation is still buggy.

commit 47c234584d3358dfa6b4003d5e7264105d17b8f7
Author: Chris Angelico <rosuav@gmail.com>
Date:   Fri Jun 6 13:15:32 2014 +1000

    objstr: Record character length separately from byte length

    CAUTION: Buggy, may crash stuff - qstr needs equivalent functionality too

commit b0f41c72af27d3b361027146025877b3d7e8785c
Author: Chris Angelico <rosuav@gmail.com>
Date:   Fri Jun 6 05:37:36 2014 +1000

    Beginnings of UTF-8 support - construct strings from that many UTF-8-encoded chars, and subscript bytes the same way

commit 89452be641674601e9bfce86dc71c17c3140a6cf
Author: Chris Angelico <rosuav@gmail.com>
Date:   Fri Jun 6 05:28:47 2014 +1000

    Update comments - now aiming for UTF-8 rather than PEP 393 strings

commit c239f509521d1a0f9563bf9c5de0c4fb9a6a33ba
Author: Chris Angelico <rosuav@gmail.com>
Date:   Wed Jun 4 05:28:12 2014 +1000

    Add PEP 393-flags to strings and stub usage.

    The test suite all passes, but nothing has actually been changed.
2014-06-27 00:04:17 +03:00
Paul Sokolovsky
83865347db objstrunicode: Complete copy of objstr, to be patched for unicode support. 2014-06-27 00:04:17 +03:00
Chris Angelico
c88987c1af py: Implement basic unicode functions. 2014-06-27 00:04:17 +03:00
Paul Sokolovsky
12bc13eeb8 mpconfig.h: Add MICROPY_PY_BUILTINS_STR_UNICODE. 2014-06-27 00:04:17 +03:00
Paul Sokolovsky
23668698cb py: Add portable framework to query/check C stack usage.
Such mechanism is important to get stable Python functioning, because Python
function calling is handled with C stack. The idea is to sprinkle
STACK_CHECK() calls in places where there can be C recursion.

TODO: Add more STACK_CHECK()'s.
2014-06-27 00:03:55 +03:00
Paul Sokolovsky
f3de62e6c2 binary: machine_uint_t vs uint dichotomy starts doing real damage. 2014-06-26 00:41:08 +03:00
Paul Sokolovsky
7a2f166949 modstruct: Fix alignment handling issues.
Also, factor out mp_binary_get_int() function.
2014-06-25 23:34:44 +03:00
Paul Sokolovsky
5aa740c3e2 modgc: Add mem_free()/mem_alloc() methods.
Return free/allocated memory on GC heap.
2014-06-25 14:28:11 +03:00
Damien George
e973acde81 Merge branch 'master' of github.com:micropython/micropython 2014-06-25 04:10:34 +01:00
Damien George
780e54cdc3 py: Implement delete_attr in native emitter. 2014-06-22 18:35:04 +01:00
Paul Sokolovsky
a96cc824bd py: Support arm and thumb ARM ISAs, in addition to thumb2.
These changes were tested with QEMU, and by few people of real hardware.
2014-06-22 01:40:45 +03:00
Paul Sokolovsky
59c675a64c py: Include mpconfig.h before all other includes.
It defines types used by all other headers.

Fixes #691.
2014-06-21 22:43:22 +03:00
Paul Sokolovsky
4c4b9d15ab mkrules.mk: Pass $(COPT) to link stage.
In generalize case, optimization options should be passed to all stages of
the build process.
2014-06-20 23:49:30 +03:00
Paul Sokolovsky
0fc7efb663 makefile: Pass STRIPFLAGS_EXTRA to strip.
Expected to be set on command line, with the idea being that for different
targets, there're different smartass ABIs which strive to put unneeded
sections into executables, etc., so let people have flexible way to
strip that.

The option name is similar to previously introduced CLFAGS_EXTRA &
LDFLAGS_EXTRA.
2014-06-20 20:25:35 +03:00
Paul Sokolovsky
3b6f7b95eb py: Separate MICROPY_PY_BUILTINS_COMPLEX from MICROPY_PY_BUILTINS_FLOAT.
One thing is wanting to do 1 / 2 and get something else but 0, and quite
another - doing rocket science ;-).
2014-06-20 18:00:23 +03:00
Paul Sokolovsky
7efbd325bb Merge pull request #697 from stinos/gc-debug
gc: More verbose debugging
2014-06-20 17:33:02 +03:00
Emmanuel Blot
f6932d6506 Prefix ARRAY_SIZE with micropython prefix MP_ 2014-06-19 18:54:34 +02:00
Emmanuel Blot
bf3366a48b Add missing “assert.h” file header inclusion from “nlr.h” 2014-06-19 18:47:38 +02:00
stijn
9acb5e4cf0 gc: Turn off debugging info again 2014-06-18 12:29:03 +02:00
stijn
def10cecd1 gc: Keep debug statements at beginning of scope where possible 2014-06-18 10:20:41 +02:00
stijn
bbcea3f62b gc: More verbose debugging
Add more DEBUG_printf statements to trace gc behaviour
2014-06-16 12:43:35 +02:00
Paul Sokolovsky
0294661da5 parsenum: Signedness issues.
char can be signedness, and using signedness types is dangerous - it can
lead to negative offsets when doing table lookups. We apparently should just
ban char usage.
2014-06-14 18:02:21 +03:00
Paul Sokolovsky
e3cfc0d33d objstr: Refactor to work with char pointers instead of indexes.
In preparation for unicode support.
2014-06-14 06:30:30 +03:00
Paul Sokolovsky
7ddbd1bee7 unicode: Add trivial implementation of unichar_charlen(). 2014-06-14 06:30:30 +03:00
Paul Sokolovsky
b0bb458810 unicode: String API is const byte*.
We still have that char vs byte dichotomy, but majority of string operations
now use byte.
2014-06-14 06:22:11 +03:00
Paul Sokolovsky
2ec38a17d4 objstr: Be 8-bit clean even for repr().
This will allow roughly the same behavior as Python3 for non-ASCII strings,
for example, print("<phrase in non-Latin script>".split()) will print list
of words, not weird hex dump (like Python2 behaves). (Of course, that it
will print list of words, if there're "words" in that phrase at all, separated
by ASCII-compatible whitespace; that surely won't apply to every human
language in existence).
2014-06-14 01:21:13 +03:00
Damien George
c037694957 py, gc: Revert ret_ptr to void*, casting to byte* for memset. 2014-06-13 22:33:31 +01:00
Damien George
63b2237323 Merge branch 'gc-pointers' of github.com:stinos/micropython into stinos-gc-pointers 2014-06-13 22:30:46 +01:00
Paul Sokolovsky
e22cddbe2a stream: Use mp_obj_is_true() for EOF testing.
Getting a length of string may be expensive, depending on the underlying
implementation.
2014-06-13 23:53:10 +03:00
stijn
f33385f56d gc: Use byte* pointers instead of void* for pointer arithmetic
void* is of unknown size
2014-06-13 20:42:06 +02:00
Damien George
8340c48389 py: Revert change of include, "" back to <> for mpconfigport.h. 2014-06-12 19:50:17 +01:00