The new `mp_obj_new_str_from_utf8_vstr` can be used when you know you
already have a unicode-safe string.
Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
Previously the desired output type was specified. Now make the type part
of the function name. Because this function is used in a few places this
saves code size due to smaller call-site.
This makes `mp_obj_new_str_type_from_vstr` a private function of objstr.c
(which is almost the only place where the output type isn't a compile-time
constant).
This saves ~140 bytes on PYBV11.
Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
Prior to this commit, parsenum would calculate "1e-20" as 1.0*pow(10, -20),
and "1.000e-20" as 1000.0*pow(10, -23); in certain cases, this could make
seemingly-identical values compare as not equal. This commit watches for
trailing zeros as a special case, and ignores them when appropriate, so
"1.000e-20" is also calculated as 1.0*pow(10, -20).
Fixes issue #5831.
Prior to this commit, complex("j") would return 0j, and complex("nanj")
would return nan+0j. This commit makes sure "j" is tested for after
parsing the number (nan, inf or a decimal), and also supports the case of
"j" on its own.
Signed-off-by: Damien George <damien@micropython.org>
This introduces a new option, MICROPY_ERROR_REPORTING_NONE, which
completely disables all error messages. To be used in cases where
MicroPython needs to fit in very limited systems.
Signed-off-by: Damien George <damien@micropython.org>
This ensures that only the translate("") alternative that will be used
is seen after preprocessing. Improves the quality of the Huffman encoding
and reduces binary size slightly.
Also makes one "enhanced" error message only occur when ERROR_REPORTING_DETAILED:
Instead of the word-for-word python3 error message
"Type object has no attribute '%q'", the message will be
"'type' object has no attribute '%q'". Also reduces binary size.
(that's rolled into this commit as it was right next to a change to
use the preprocessor for MICROPY_ERROR_REPORTING)
Note that the odd semicolon after "value_error:" in parsenum.c is necessary
due to a detail of the C grammar, in which a declaration cannot follow
a label directly.
Initially some of these were found building the unix coverage variant on
MacOS because that build uses clang and has -Wdouble-promotion enabled, and
clang performs more vigorous promotion checks than gcc. Additionally the
codebase has been compiled with clang and msvc (the latter with warning
level 3), and with MICROPY_FLOAT_IMPL_FLOAT to find the rest of the
conversions.
Fixes are implemented either as explicit casts, or by using the correct
type, or by using one of the utility functions to handle floating point
casting; these have been moved from nativeglue.c to the public API.
Instead of compiler-level if-logic. This is necessary to know what error
strings are included in the build at the preprocessor stage, so that string
compression can be implemented.
These were found by buiding the unix coverage variant on macOS (so clang
compiler). Mostly, these are fixing implicit cast of float/double to
mp_float_t which is one of those two and one mp_int_t to size_t fix for
good measure.
This patches avoids multiplying with negative powers-of-10 when parsing
floating-point values, when those powers-of-10 can be exactly represented
as a positive power. When represented as a positive power and used to
divide, the resulting float will not have any rounding errors.
The issue is that mp_parse_num_decimal will sometimes not give the closest
floating representation of the input string. Eg for "0.3", which can't be
represented exactly in floating point, mp_parse_num_decimal gives a
slightly high (by 1LSB) result. This is because it computes the answer as
3 * 0.1, and since 0.1 also can't be represented exactly, multiplying by 3
multiplies up the rounding error in the 0.1. Computing it as 3 / 10, as
now done by the change in this commit, gives an answer which is as close to
the true value of "0.3" as possible.
This saves code space in builds which use link-time optimization.
The optimization drops the untranslated strings and replaces them
with a compressed_string_t struct. It can then be decompressed to
a c string.
Builds without LTO work as well but include both untranslated
strings and compressed strings.
This work could be expanded to include QSTRs and loaded strings if
a compress method is added to C. Its tracked in #531.
Fuzz testing combined with the undefined behavior sanitizer found that
parsing unreasonable float literals like 1e+9999999999999 resulted in
undefined behavior due to overflow in signed integer arithmetic, and a
wrong result being returned.
There is no need to use the mp_int_t type which may be 64-bits wide, there
is enough bit-width in a normal int to parse reasonable exponents. Using
int helps to reduce code size for 64-bit ports, especially nan-boxing
builds. (Similarly for the "dig" variable which is now an unsigned int.)
Prior to this patch, a float literal that was close to subnormal would
have a loss of precision when parsed. The worst case was something like
float('10000000000000000000e-326') which returned 0.0.
This patch improves parsing of floating point numbers by converting all the
digits (integer and fractional) together into a number 1 or greater, and
then applying the correct power of 10 at the very end. In particular the
multiple "multiply by 0.1" operations to build a fraction are now combined
together and applied at the same time as the exponent, at the very end.
This helps to retain precision during parsing of floats, and also includes
a check that the number doesn't overflow during the parsing. One benefit
is that a float will have the same value no matter where the decimal point
is located, eg 1.23 == 123e-2.