53 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Paul Sokolovsky
|
9731912ccb | py: Prune unneeded code from objstrunicode, reuse code in objstr. | ||
Chris Angelico
|
64b468d873 |
objstrunicode: Basic implementation of unicode handling.
Squashed commit of the following: commit 99dc21b67a895dc10d3c846bc158d27c839cee48 Author: Chris Angelico <rosuav@gmail.com> Date: Thu Jun 12 02:18:54 2014 +1000 Optimize as per TODO (thanks Damien!) commit 5bf0153ecad8348443058d449d74504fc458fe51 Author: Chris Angelico <rosuav@gmail.com> Date: Tue Jun 10 08:42:06 2014 +1000 Test a default (= UTF-8) encode and decode commit c962057ac340832c4fde60896f656a3fe3ad78a9 Merge: e2c9782 195de32 Author: Chris Angelico <rosuav@gmail.com> Date: Tue Jun 10 05:23:03 2014 +1000 Merge branch 'master' into unicode, resolving conflict on py/obj.h commit e2c9782a65eb57f481d441d40161de427e1940ba Author: Chris Angelico <rosuav@gmail.com> Date: Tue Jun 10 05:05:57 2014 +1000 More whitespace fixups commit 086a2a0f57afbc1f731697fd5d3a0cbbb80e5418 Author: Chris Angelico <rosuav@gmail.com> Date: Tue Jun 10 05:04:20 2014 +1000 Properly implement string slicing commit 0d339a143e2b6442366145e7f3d64aada293eaa0 Author: Chris Angelico <rosuav@gmail.com> Date: Tue Jun 10 02:24:11 2014 +1000 Support slicing in str_index_to_ptr, and fix a bounds error commit 24371c7267d360e77cf5eabc2e8ce9a73d2ee0da Author: Chris Angelico <rosuav@gmail.com> Date: Tue Jun 10 02:10:22 2014 +1000 Break out index-to-pointer calculation into a function commit 616c24ac014c3ca56008428c506034dd1bfff7a8 Author: Chris Angelico <rosuav@gmail.com> Date: Tue Jun 10 02:03:11 2014 +1000 Add tests of string slicing, which currently fail commit a24d19f676fe8cc21dad512d91b826892e162a5b Author: Chris Angelico <rosuav@gmail.com> Date: Tue Jun 10 01:56:53 2014 +1000 Change string indexing to not precalculate the charlen, and add test for neg indexing commit 0bcc7ab89eafb2ae53195e94c9bea42a4e886b64 Author: Chris Angelico <rosuav@gmail.com> Date: Sun Jun 8 22:09:17 2014 +1000 Clean up constant qstr declarations now that charlen isn't needed commit 5473e1a1dba2124b7b0c207f2964293cfbe80167 Author: Chris Angelico <rosuav@gmail.com> Date: Sun Jun 8 07:18:42 2014 +1000 Remove the charlen field from strings, calculating it when required commit 5c1658ec71aefbdc88c261ce2e57dc7670cdc6ef Author: Chris Angelico <rosuav@gmail.com> Date: Sun Jun 8 07:11:27 2014 +1000 Get rid of mp_obj_str_get_data_len() which was used in only one place commit a019ba968b4e8daf7f3674f63c5cc400e304c509 Author: Chris Angelico <rosuav@gmail.com> Date: Sun Jun 8 06:58:26 2014 +1000 Add a unichar_charlen() function to calculate length-in-characters from length-in-bytes commit 44b0d5cff846ba487c526ed95be1b3d1cd3d762a Author: Chris Angelico <rosuav@gmail.com> Date: Sun Jun 8 06:32:44 2014 +1000 Use utf8_get/next_char in building up a string's repr commit 30d1bad33f7af90f1971987c39864c8fcf3f5c21 Author: Chris Angelico <rosuav@gmail.com> Date: Sun Jun 8 06:10:45 2014 +1000 Make utf8_get_char() and utf8_next_char() actually do what their names say commit bc990dad9afb8ec112f5e7f7f79d5ab415da0e72 Author: Chris Angelico <rosuav@gmail.com> Date: Sun Jun 8 02:10:59 2014 +1000 Revert "Add PEP 393-flags to strings and stub usage." This reverts commit c239f509521d1a0f9563bf9c5de0c4fb9a6a33ba. commit f9bebb28ad52467f2f2d7a752bb033296b6c2f9b Author: Chris Angelico <rosuav@gmail.com> Date: Sat Jun 7 15:41:48 2014 +1000 Whitespace fixes commit 279de0c8eb3cb186914799ccc5ee94ea97f56de4 Author: Chris Angelico <rosuav@gmail.com> Date: Sat Jun 7 15:28:35 2014 +1000 Formatting/layout improvements - introduce macros for UTF-8 byte detection, add braces. No functional changes. commit f1911f53d56da809c97b07245f5728a419e8fb30 Author: Chris Angelico <rosuav@gmail.com> Date: Sat Jun 7 11:56:02 2014 +1000 Make chr() Unicode-aware commit f51ad737b48ac04c161197a4012821d50885c4c7 Author: Chris Angelico <rosuav@gmail.com> Date: Sat Jun 7 11:44:07 2014 +1000 Make a string's repr Unicode-aware commit 01bd68684611585d437982dccdf05b33cbedc630 Author: Chris Angelico <rosuav@gmail.com> Date: Sat Jun 7 11:33:43 2014 +1000 Expand the Unicode tests commit 7bc91904f899f8012089fc14a06495680a51e590 Author: Chris Angelico <rosuav@gmail.com> Date: Sat Jun 7 11:27:30 2014 +1000 Record byte lengths for byte strings commit bb132120717cf176dcfb26f87fa309378f76ab5f Author: Chris Angelico <rosuav@gmail.com> Date: Sat Jun 7 11:25:06 2014 +1000 Make ord() Unicode-aware commit 03f0cbe9051b62192be97b59f84f63f9216668bf Author: Chris Angelico <rosuav@gmail.com> Date: Sat Jun 7 10:24:35 2014 +1000 Retain characters as UTF-8 encoded Unicode commit e924659b85c001916a5ff7f4d1d8b3ebe2bf0c2f Author: Chris Angelico <rosuav@gmail.com> Date: Sat Jun 7 08:37:27 2014 +1000 Add support for \u and \U escapes, but not \N (with explanatory comment) commit 231031ac5f0346e4ffcf9c4abec2bd33f566232c Author: Chris Angelico <rosuav@gmail.com> Date: Sat Jun 7 05:09:35 2014 +1000 Add character length to qstr commit 6df1b946fb17d8d5df3d91b21cde627c3d4556a8 Author: Chris Angelico <rosuav@gmail.com> Date: Fri Jun 6 13:48:36 2014 +1000 Add test of UTF-8 encoded source file resulting in properly formed string commit 16429b81a8483cf25865ed11afd81a7d9c253c26 Author: Chris Angelico <rosuav@gmail.com> Date: Fri Jun 6 13:44:15 2014 +1000 Make len(s) return character length (even though creation's still buggy) commit cd2cf6663cc47831dbc97819ad5c50ad33f939d3 Author: Chris Angelico <rosuav@gmail.com> Date: Fri Jun 6 13:15:36 2014 +1000 HACK - When indexing a qstr, count its charlen. Stupidly inefficient but POC. All tests pass now, though string creation is still buggy. commit 47c234584d3358dfa6b4003d5e7264105d17b8f7 Author: Chris Angelico <rosuav@gmail.com> Date: Fri Jun 6 13:15:32 2014 +1000 objstr: Record character length separately from byte length CAUTION: Buggy, may crash stuff - qstr needs equivalent functionality too commit b0f41c72af27d3b361027146025877b3d7e8785c Author: Chris Angelico <rosuav@gmail.com> Date: Fri Jun 6 05:37:36 2014 +1000 Beginnings of UTF-8 support - construct strings from that many UTF-8-encoded chars, and subscript bytes the same way commit 89452be641674601e9bfce86dc71c17c3140a6cf Author: Chris Angelico <rosuav@gmail.com> Date: Fri Jun 6 05:28:47 2014 +1000 Update comments - now aiming for UTF-8 rather than PEP 393 strings commit c239f509521d1a0f9563bf9c5de0c4fb9a6a33ba Author: Chris Angelico <rosuav@gmail.com> Date: Wed Jun 4 05:28:12 2014 +1000 Add PEP 393-flags to strings and stub usage. The test suite all passes, but nothing has actually been changed. |
||
Paul Sokolovsky
|
83865347db | objstrunicode: Complete copy of objstr, to be patched for unicode support. |