Commit Graph

96 Commits

Author SHA1 Message Date
Scott Shawcroft
b35fa44c8a
Merge MicroPython 1.12 into CircuitPython 2021-05-03 14:01:18 -07:00
microDev
a52eb88031
run code formatting script 2021-03-15 19:27:36 +05:30
Diego Elio Pettenò
34b4993d63 Add license to some obvious files. 2020-07-06 19:16:25 +01:00
Jeff Epler
473e9c5ffb f-strings: Make optional, defaulting to !CIRCUITPY_MINIMAL_BUILD
This should reclaim *most* code space added to handle f-strings.
However, there may be some small code growth as parse_string_literal
takes a new parameter (which will always be 0, so hopefully the optimizer
eliminates it)
2020-03-09 09:03:25 -05:00
Jeff Epler
32647cd9b4 lexer: catch concatenation of f'' and '' strings
This turns the "edge case" into a parse-time error.
2020-03-09 09:03:25 -05:00
Josh Klar
40bc05ee1e Address dpgeorge feedback - largely simplifications 2020-03-09 08:16:07 -05:00
Josh Klar
3a7a5ba686 py: Implement partial PEP-498 (f-string) support
This implements (most of) the PEP-498 spec for f-strings, with two
exceptions:

- raw f-strings (`fr` or `rf` prefixes) raise `NotImplementedError`
- one special corner case does not function as specified in the PEP
(more on that in a moment)

This is implemented in the core as a syntax translation, brute-forcing
all f-strings to run through `String.format`. For example, the statement
`x='world'; print(f'hello {x}')` gets translated *at a syntax level*
(injected into the lexer) to `x='world'; print('hello {}'.format(x))`.
While this may lead to weird column results in tracebacks, it seemed
like the fastest, most efficient, and *likely* most RAM-friendly option,
despite being implemented under the hood with a completely separate
`vstr_t`.

Since [string concatenation of adjacent literals is implemented in the
lexer](534b7c368d),
two side effects emerge:

- All strings with at least one f-string portion are concatenated into a
single literal which *must* be run through `String.format()` wholesale,
and:
- Concatenation of a raw string with interpolation characters with an
f-string will cause `IndexError`/`KeyError`, which is both different
from CPython *and* different from the corner case mentioned in the PEP
(which gave an example of the following:)

```python
x = 10
y = 'hi'
assert ('a' 'b' f'{x}' '{c}' f'str<{y:^4}>' 'd' 'e') == 'ab10{c}str< hi >de'
```

The above-linked commit detailed a pretty solid case for leaving string
concatenation in the lexer rather than putting it in the parser, and
undoing that decision would likely be disproportionately costly on
resources for the sake of a probably-low-impact corner case. An
alternative to become complaint with this corner case of the PEP would
be to revert to string concatenation in the parser *only when an
f-string is part of concatenation*, though I've done no investigation on
the difficulty or costs of doing this.

A decent set of tests is included. I've manually tested this on the
`unix` port on Linux and on a Feather M4 Express (`atmel-samd`) and
things seem sane.
2020-03-09 08:16:07 -05:00
Scott Shawcroft
96ebf5bc3f
Two fixes and translate more strings.
* Fix finding translations with escaped characters.
* Add back \r to translations since its needed by screen.
2018-08-09 13:29:30 -07:00
Damien George
6a445b60fa py/lexer: Add support for underscores in numeric literals.
This is a very convenient feature introduced in Python 3.6 by PEP 515.
2018-06-12 12:17:43 +10:00
Damien George
a3dc1b1957 all: Remove inclusion of internal py header files.
Header files that are considered internal to the py core and should not
normally be included directly are:
    py/nlr.h - internal nlr configuration and declarations
    py/bc0.h - contains bytecode macro definitions
    py/runtime0.h - contains basic runtime enums

Instead, the top-level header files to include are one of:
    py/obj.h - includes runtime0.h and defines everything to use the
        mp_obj_t type
    py/runtime.h - includes mpstate.h and hence nlr.h, obj.h, runtime0.h,
        and defines everything to use the general runtime support functions

Additional, specific headers (eg py/objlist.h) can be included if needed.
2017-10-04 12:37:50 +11:00
Javier Candeira
35a1fea90b all: Raise exceptions via mp_raise_XXX
- Changed: ValueError, TypeError, NotImplementedError
  - OSError invocations unchanged, because the corresponding utility
    function takes ints, not strings like the long form invocation.
  - OverflowError, IndexError and RuntimeError etc. not changed for now
    until we decide whether to add new utility functions.
2017-08-13 22:52:33 +10:00
Alexander Steffen
55f33240f3 all: Use the name MicroPython consistently in comments
There were several different spellings of MicroPython present in comments,
when there should be only one.
2017-07-31 18:35:40 +10:00
Tom Collins
145796f037 py,extmod: Some casts and minor refactors to quiet compiler warnings. 2017-07-07 11:32:22 +10:00
Tom Collins
6f56412ec3 py/lexer: Process CR earlier to allow newlines checks on chr1.
Resolves an issue where lexer failed to accept CR after line continuation
character.  It also simplifies the code.
2017-05-12 15:14:24 +10:00
Tom Collins
2998647c4e py/lexer: Simplify lexer startup by using dummy bytes and next_char().
Now consistently uses the EOL processing ("\r" and "\r\n" convert to "\n")
and EOF processing (ensure "\n" before EOF) provided by next_char().

In particular the lexer can now correctly handle input that starts with CR.
2017-05-09 14:43:23 +10:00
Damien George
5010d1958f py/lexer: Simplify and reduce code size for operator tokenising.
By removing the 'E' code from the operator token encoding mini-language the
tokenising can be simplified.  The 'E' code was only used for the !=
operator which is now handled as a special case; the optimisations for the
general case more than make up for the addition of this single, special
case.  Furthermore, the . and ... operators can be handled in the same way
as != which reduces the code size a little further.

This simplification also removes a "goto".

Changes in code size for this patch are (measured in bytes):

bare-arm:       -48
minimal x86:    -64
unix x86-64:   -112
unix nanbox:    -64
stmhal:         -48
cc3200:         -48
esp8266:        -76
2017-03-29 10:56:52 +11:00
Damien George
f64a3e296e py/lexer: Remove obsolete comment, since lexer can now raise exceptions. 2017-03-23 16:40:24 +11:00
Damien George
1831034be1 py: Allow lexer to raise exceptions during construction.
This patch refactors the error handling in the lexer, to simplify it (ie
reduce code size).

A long time ago, when the lexer/parser/compiler were first written, the
lexer and parser were designed so they didn't use exceptions (ie nlr) to
report errors but rather returned an error code.  Over time that has
gradually changed, the parser in particular has more and more ways of
raising exceptions.  Also, the lexer never really handled all errors without
raising, eg there were some memory errors which could raise an exception
(and in these rare cases one would get a fatal nlr-not-handled fault).

This patch accepts the fact that the lexer can raise exceptions in some
cases and allows it to raise exceptions to handle all its errors, which are
for the most part just out-of-memory errors during construction of the
lexer.  This makes the lexer a bit simpler, and also the persistent code
stuff is simplified.

What this means for users of the lexer is that calls to it must be wrapped
in a nlr handler.  But all uses of the lexer already have such an nlr
handler for the parser (and compiler) so that doesn't put any extra burden
on the callers.
2017-03-14 11:52:05 +11:00
Damien George
5124a94067 py/lexer: Convert mp_uint_t to size_t where appropriate. 2017-02-17 12:44:24 +11:00
Damien George
534b7c368d py: Do adjacent str/bytes literal concatenation in lexer, not compiler.
It's much more efficient in RAM and code size to do implicit literal string
concatenation in the lexer, as opposed to the compiler.

RAM usage is reduced because the concatenation can be done right away in the
tokeniser by just accumulating the string/bytes literals into the lexer's
vstr.  Prior to this patch adjacent strings/bytes would create a parse tree
(one node per string/bytes) and then in the compiler a whole new chunk of
memory was allocated to store the concatenated string, which used more than
double the memory compared to just accumulating in the lexer.

This patch also significantly reduces code size:

bare-arm: -204
minimal:  -204
unix x64: -328
stmhal:   -208
esp8266:  -284
cc3200:   -224
2017-02-17 12:12:40 +11:00
Damien George
773278ec30 py/lexer: Simplify handling of line-continuation error.
Previous to this patch there was an explicit check for errors with line
continuation (where backslash was not immediately followed by a newline).

But this check is not necessary: if there is an error then the remaining
logic of the tokeniser will reject the backslash and correctly produce a
syntax error.
2017-02-17 11:30:14 +11:00
Damien George
ae43679792 py/lexer: Use strcmp to make keyword searching more efficient.
Since the table of keywords is sorted, we can use strcmp to do the search
and stop part way through the search if the comparison is less-than.

Because all tokens that are names are subject to this search, this
optimisation will improve the overall speed of the lexer when processing
a script.

The change also decreases code size by a little bit because we now use
strcmp instead of the custom str_strn_equal function.
2017-02-17 11:10:35 +11:00
Damien George
a68c754688 py/lexer: Move check for keyword to name-tokenising block.
Keywords only needs to be searched for if the token is a MP_TOKEN_NAME, so
we can move the seach to the part of the code that does the tokenising for
MP_TOKEN_NAME.
2017-02-17 10:59:57 +11:00
Damien George
98b3072da5 py/lexer: Simplify handling of indenting of very first token. 2017-02-17 10:56:06 +11:00
Damien George
c264414746 py/lexer: Don't generate string representation for period or ellipsis.
It's not needed.
2017-02-16 20:23:41 +11:00
Damien George
8beba7310f extmod/vfs_fat: Remove MICROPY_READER_FATFS component. 2017-01-30 12:26:07 +11:00
Damien George
dcb9ea7215 extmod: Add generic VFS sub-system.
This provides mp_vfs_XXX functions (eg mount, open, listdir) which are
agnostic to the underlying filesystem type, and just require an object with
the relevant filesystem-like methods (eg .mount, .open, .listidr) which can
then be mounted.

These mp_vfs_XXX functions would typically be used by a port to implement
the "uos" module, and mp_vfs_open would be the builtin open function.

This feature is controlled by MICROPY_VFS, disabled by default.
2017-01-27 17:19:06 +11:00
Damien George
c305ae3243 py/lexer: Permanently disable the mp_lexer_show_token function.
The lexer is very mature and this debug function is no longer used.  If
it's really needed one can uncomment it and recompile.
2016-12-22 10:49:54 +11:00
Damien George
f4aebafe7a py/lexer: Remove unnecessary check for EOF in lexer's next_char func.
This check always fails (ie chr0 is never EOF) because the callers of this
function never call it past the end of the input stream.  And even if they
did it would be harmless because 1) reader.readbyte must continue to
return an EOF char if the stream is exhausted; 2) next_char would just
count the subsequent EOF's as characters worth 1 column.
2016-12-22 10:39:06 +11:00
Damien George
b9c4783273 py/lexer: Remove unreachable code in string tokeniser. 2016-12-22 10:37:13 +11:00
Damien George
adccafb42a tests/basics/lexer: Add a test for newline-escaping within a string. 2016-12-22 10:32:06 +11:00
Damien George
5bdf1650de py/lexer: Make lexer use an mp_reader as its source. 2016-11-16 18:35:01 +11:00
Damien George
66d955c218 py/lexer: Rewrite mp_lexer_new_from_fd in terms of mp_reader. 2016-11-16 18:13:51 +11:00
Damien George
e5ef15a9d7 py/lexer: Provide generic mp_lexer_new_from_file based on mp_reader.
If a port defines MICROPY_READER_POSIX or MICROPY_READER_FATFS then
lexer.c now provides an implementation of mp_lexer_new_from_file using
the mp_reader_new_file function.
2016-11-16 18:13:51 +11:00
Damien George
511c083811 py/lexer: Rewrite mp_lexer_new_from_str_len in terms of mp_reader_mem. 2016-11-16 18:13:50 +11:00
Damien George
31101d91ce py/lexer: Remove unnecessary code, and unreachable code.
Setting emit_dent=0 is unnecessary because arriving in that part of the
if-logic will guarantee that emit_dent is already zero.

The block to check indent_top(lex)>0 is unreachable because a newline is
always inserted an the end of the input stream, and hence dedents are
always processed before EOF.
2016-10-12 11:00:17 +11:00
Damien George
5da0d29d3c py/vstr: Remove vstr.had_error flag and inline basic vstr functions.
The vstr.had_error flag was a relic from the very early days which assumed
that the malloc functions (eg m_new, m_renew) returned NULL if they failed
to allocate.  But that's no longer the case: these functions will raise an
exception if they fail.

Since it was impossible for had_error to be set, this patch introduces no
change in behaviour.

An alternative option would be to change the malloc calls to the _maybe
variants, which return NULL instead of raising, but then a lot of code
will need to explicitly check if the vstr had an error and raise if it
did.

The code-size savings for this patch are, in bytes: bare-arm:188,
minimal:456, unix(NDEBUG,x86-64):368, stmhal:228, esp8266:360.
2016-09-19 12:28:55 +10:00
Damien George
3ff16ff52e py: Declare constant data as properly constant.
Otherwise some compilers (eg without optimisation) will put this read-only
data in RAM instead of ROM.
2016-05-20 12:46:20 +01:00
pohmelie
81ebba7e02 py: add async/await/async for/async with syntax
They are sugar for marking function as generator, "yield from"
and pep492 python "semantically equivalents" respectively.

@dpgeorge was the original author of this patch, but @pohmelie made
changes to implement `async for` and `async with`.
2016-04-13 15:26:38 +01:00
Damien George
ea23520403 py: Add MICROPY_DYNAMIC_COMPILER option to config compiler at runtime.
This new compile-time option allows to make the bytecode compiler
configurable at runtime by setting the fields in the mp_dynamic_compiler
structure.  By using this feature, the compiler can generate bytecode
that targets any MicroPython runtime/VM, regardless of the host and
target compile-time settings.

Options so far that fall under this dynamic setting are:
- maximum number of bits that a small int can hold;
- whether caching of lookups is used in the bytecode;
- whether to use unicode strings or not (lexer behaviour differs, and
  therefore generated string constants differ).
2016-02-25 10:05:46 +00:00
Damien George
dd5353a405 py: Add MICROPY_ENABLE_COMPILER and MICROPY_PY_BUILTINS_EVAL_EXEC opts.
MICROPY_ENABLE_COMPILER can be used to enable/disable the entire compiler,
which is useful when only loading of pre-compiled bytecode is supported.
It is enabled by default.

MICROPY_PY_BUILTINS_EVAL_EXEC controls support of eval and exec builtin
functions.  By default they are only included if MICROPY_ENABLE_COMPILER
is enabled.

Disabling both options saves about 40k of code size on 32-bit x86.
2015-12-18 12:35:44 +00:00
Damien George
2b000474d9 py/lexer: Properly classify floats that look like hex numbers.
Eg 0e0 almost looks like a hex number but in fact is a float.
2015-09-07 17:33:44 +01:00
Damien George
0be3c70cd8 py/lexer: Raise SyntaxError when unicode char point out of range. 2015-09-07 17:19:17 +01:00
Damien George
081f9325f5 py/lexer: Raise NotImplError for unicode name escape, instead of assert. 2015-09-07 17:08:49 +01:00
Damien George
d241c2a592 py/lexer: Raise SyntaxError when str hex escape sequence is malformed.
Addresses issue #1390.
2015-07-23 23:20:37 +01:00
Damien George
7f19a39a3b py: Cast argument for printf to int, to be compatible with more ports.
This allows stmhal to be compiled with MICROPY_DEBUG_PRINTERS.
2015-06-22 17:40:12 +01:00
Damien George
7ed58cb663 py: Support unicode (utf-8 encoded) identifiers in Python source.
Enabled simply by making the identifier lexing code 8-bit clean.
2015-06-09 10:58:07 +00:00
Dave Hylands
3ad94d6072 extmod: Add ubinascii.unhexlify
This also pulls out hex_digit from py/lexer.c and makes unichar_hex_digit
2015-05-20 09:29:22 +01:00
Damien George
2e2e404ff7 py: Allow to compile with extra warnings (sign-compare, unused-param). 2015-03-19 00:25:33 +00:00
Damien George
7d414a1b52 py: Parse big-int/float/imag constants directly in parser.
Previous to this patch, a big-int, float or imag constant was interned
(made into a qstr) and then parsed at runtime to create an object each
time it was needed.  This is wasteful in RAM and not efficient.  Now,
these constants are parsed straight away in the parser and turned into
objects.  This allows constants with large numbers of digits (so
addresses issue #1103) and takes us a step closer to #722.
2015-02-08 01:57:40 +00:00