19 Commits

Author SHA1 Message Date
Damien George
08e0e065f4 py/makeqstrdefs.py: Don't include .h files explicitly in preprocessing.
Only include .c and .cpp files explicitly in the list of files passed to
the preprocessor for QSTR extraction.  All relevant .h files will be
included in this process by "#include" from the .c(pp) files.  In
particular for moduledefs.h, this is included by py/objmodule.c (and
doesn't actually contain any extractable MP_QSTR_xxx, but rather defines
macros with MP_QSTR_xxx's in them which are then part of py/objmodule.c).

The main reason for this change is to simplify the preprocessing step on
the javascript port, which tries to compile .h files as C++ precompiled
headers if they are passed with -E to clang.

Signed-off-by: Damien George <damien@micropython.org>
2021-06-25 10:50:54 +10:00
Jim Mussared
a7932ae4e6 tools/makeqstrdefs.py: Run qstr preprocessing in parallel.
This gives a substantial speedup of the preprocessing step, i.e. the
generation of qstr.i.last.  For example on a clean build, making
qstr.i.last:

    21s -> 4s on STM32 (WB55)
    8.9 -> 1.8s on Unix (dev).

Done in collaboration with @stinos.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2020-11-12 15:04:53 +11:00
stijn
8e94fa0d2e py/makeqstrdefs.py: Support preprocessing C++ files for QSTR generation.
When SCR_QSTR contains C++ files they should be preprocessed with the same
compiler flags (CXXFLAGS) as they will be compiled with, to make sure code
scanned for QSTR occurrences is effectively the code used in the rest of
the build.  The 'split SCR_QSTR in .c and .cpp files and process each with
different flags' logic isn't trivial to express in a Makefile and the
existing principle for deciding which files to preprocess was already
rather complicated, so the actual preprocessing is moved into
makeqstrdefs.py completely.
2020-10-29 15:27:30 +11:00
stijn
2b9f0586e7 py/makeqstrdefs.py: Process C++ files as well.
Preprocessed C++ code isn't different from C code when it comes to QSTR
instances so process it as well.
2020-10-29 15:27:11 +11:00
stijn
1b723937e3 py/makeqstrdefs.py: Fix beaviour when scanning non-C preprocessed files.
When process_file() is passed a preprocessed C++ file for instance it won't
find any lines containing .c files and the last_fname variable remains
None, so handle that gracefully.
2020-10-29 15:26:35 +11:00
Jim Mussared
154b4eb354 py: Implement "common word" compression scheme for error messages.
The idea here is that there's a moderate amount of ROM used up by exception
text.  Obviously we try to keep the messages short, and the code can enable
terse errors, but it still adds up.  Listed below is the total string data
size for various ports:

    bare-arm 2860
    minimal 2876
    stm32 8926  (PYBV11)
    cc3200 3751
    esp32 5721

This commit implements compression of these strings.  It takes advantage of
the fact that these strings are all 7-bit ascii and extracts the top 128
frequently used words from the messages and stores them packed (dropping
their null-terminator), then uses (0x80 | index) inside strings to refer to
these common words.  Spaces are automatically added around words, saving
more bytes.  This happens transparently in the build process, mirroring the
steps that are used to generate the QSTR data.  The MP_COMPRESSED_ROM_TEXT
macro wraps any literal string that should compressed, and it's
automatically decompressed in mp_decompress_rom_string.

There are many schemes that could be used for the compression, and some are
included in py/makecompresseddata.py for reference (space, Huffman, ngram,
common word).  Results showed that the common-word compression gets better
results.  This is before counting the increased cost of the Huffman
decoder.  This might be slightly counter-intuitive, but this data is
extremely repetitive at a word-level, and the byte-level entropy coder
can't quite exploit that as efficiently.  Ideally one would combine both
approaches, but for now the common-word approach is the one that is used.

For additional comparison, the size of the raw data compressed with gzip
and zlib is calculated, as a sort of proxy for a lower entropy bound.  With
this scheme we come within 15% on stm32, and 30% on bare-arm (i.e. we use
x% more bytes than the data compressed with gzip -- not counting the code
overhead of a decoder, and how this would be hypothetically implemented).

The feature is disabled by default and can be enabled by setting
MICROPY_ROM_TEXT_COMPRESSION at the Makefile-level.
2020-04-05 14:20:57 +10:00
Damien George
69661f3343 all: Reformat C and Python source code with tools/codeformat.py.
This is run with uncrustify 0.70.1, and black 19.10b0.
2020-02-28 10:33:03 +11:00
Jim Mussared
a09fd04758 py/makeqstrdefs.py: Remove unused blacklist.
As of 7d58a197cffa7c0dd3686402d2e381812bb8ddeb, `NULL` should no longer be
here because it's allowed (MP_QSTRnull took its place).  This entry was
preventing the use of MP_QSTR_NULL to mean "NULL" (although this is not
currently used).

A blacklist should not be needed because it should be possible to intern
all strings.

Fixes issue #5140.
2019-10-04 17:18:56 +10:00
Damien George
673e154dfe py/makedefs: Use io.open with utf-8 encoding when processing source.
In case (user) source code contains utf-8 encoded data and the default
locale is not utf-8.

See #4592.
2019-04-12 11:34:52 +10:00
Damien George
f6a1f18603 py/makeqstrdefs.py: Optimise by using compiled re's so it runs faster.
By using pre-compiled regexs, using startswith(), and explicitly checking
for empty lines (of which around 30% of the input lines are), automatic
qstr extraction is speed up by about 10%.
2018-03-16 23:54:06 +11:00
Damien George
b24ccfc639 py/makeqstrdefs.py: Make script run correctly with Python 2.6. 2017-06-09 13:42:13 +10:00
Chris Packham
a50b26e4b0 py/makeqstrdefs.py: Use python 2.6 syntax for set creation.
py/makeqstrdefs.py declares that it works with python 2.6 however the
syntax used to initialise of a set with values was only added in python
2.7. This leads to build failures when the host system doesn't have
python 2.7 or newer.

Instead of using the new syntax pass a list of initial values through
set() to achieve the same result. This should work for python versions
from at least 2.6 onwards.

Helped-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Chris Packham <judge.packham@gmail.com>
2016-09-09 23:01:23 +10:00
Paul Sokolovsky
dcb904416a py/makeqstrdefs.py: Remove restriction that source path can't be absolute.
That's arbitrary restriction, in case of embedding, a source file path may
be absolute. For the purpose of filtering out system includes, checking
for ".c" suffix is enough.
2016-06-16 01:04:42 +03:00
stijn
9264d42e2a py/makeqstrdefs.py: Windows compatibility.
- msvc preprocessor output contains full paths with backslashes so the
  ':' and '\' characters needs to be erased from the paths as well
- use a regex for extraction of filenames from preprocessor output so it
  can handle both gcc and msvc preprocessor output, and spaces in paths
  (also thanks to a PR from @travnicekivo for part of that regex)
- os.rename will fail on windows if the destination file already exists,
  so simply attempt to delete that file first
2016-04-25 22:34:22 +01:00
stijn
b2b771ca02 py/makeqstrdefs.py: Remove unused function/variable/import. 2016-04-25 22:34:20 +01:00
Paul Sokolovsky
1b60a6dc4e py: Divide "split" and "cat" phases of qstr extraction for better efficiency.
E.g. for stmhal, accumulated preprocessed output may grow large due to
bloated vendor headers, and then reprocessing tens of megabytes on each
build make take couple of seconds on fast hardware (=> potentially dozens
of seconds on slow hardware). So instead, split once after each change,
and only cat repetitively (guaranteed to be fast, as there're thousands
of lines involved at most).
2016-04-19 14:39:08 +03:00
Paul Sokolovsky
8dd704b019 py/makeqstrdefs.py: Process only CPP line-numbering info.
Not stuff like "#pragma", etc.
2016-04-19 12:52:57 +03:00
Paul Sokolovsky
c618f91e22 py: Rework QSTR extraction to work in simple and obvious way.
When there're C files to be (re)compiled, they're all passed first to
preprocessor. QSTR references are extracted from preprocessed output and
split per original C file. Then all available qstr files (including those
generated previously) are catenated together. Only if the resulting content
has changed, the output file is written (causing almost global rebuild
to pick up potentially renumbered qstr's). Otherwise, it's not updated
to not cause spurious rebuilds. Related make rules are split to minimize
amount of commands executed in the interim case (when some C files were
updated, but no qstrs were changed).
2016-04-19 11:37:56 +03:00
Pavel Moravec
dbbf082786 py/makeqstrdefs: Add script to automate extraction of qstr from sources.
This script will search for patterns of the form Q(...) and generate a
list of them.

The original code by Pavel Moravec has been significantly simplified to
remove the part that searched for C preprocessor directives (eg #if).
This is because all source is now run through CPP before being fed into
this script.
2016-04-16 13:13:52 +01:00