Commit Graph

146 Commits

Author SHA1 Message Date
Angus Gratton
3e8aed9fcc py/gc: Add "max new split" value in result of gc.mem_free().
Follow-up to 519c24dd48 when MICROPY_GC_SPLIT_HEAP_AUTO is enabled, based
on discussion at
https://github.com/orgs/micropython/discussions/12316#discussioncomment-6858007

gc.mem_free() is always a heuristic, but this makes it a more useful
heuristic for common use cases.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
2023-09-15 12:19:13 +10:00
Angus Gratton
519c24dd48 py/gc: Add new MICROPY_GC_SPLIT_HEAP_AUTO "auto grow heap" mode.
When set, the split heap is automatically extended with new areas on
demand, and shrunk if a heap area becomes empty during a GC pass or soft
reset.

To save code size the size allocation for a new heap block (including
metadata) is estimated at 103% of the failed allocation, rather than
working from the more complex algorithm in gc_try_add_heap(). This appears
to work well except in the extreme limit case when almost all RAM is
exhausted (~last few hundred bytes). However in this case some allocation
is likely to fail soon anyhow.

Currently there is no API to manually add a block of a given size to the
heap, although that could easily be added if necessary.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
2023-08-15 10:48:02 +10:00
Angus Gratton
d325ee4509 py/gc: Apply some code formatting cleanup.
This commit:
- Breaks up some long lines for readability.
- Fixes a potential macro argument expansion issue.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
2023-08-15 10:41:02 +10:00
Damien Tournoud
2dcd745434 py/gc: Speed up incremental GC cycles by tracking the last used block.
In applications that use little memory and run GC regularly, the cost of
the sweep phase quickly becomes prohibitives as the amount of RAM
increases.

On an ESP32-S3 with 2 MB of external SPIRAM, for example, a trivial GC
cycle takes a minimum of 40ms, virtually all of it in the sweep phase.

Similarly, on the UNIX port with 1 GB of heap, a trivial GC takes 47 ms,
again virtually all of it in the sweep phase.

This commit speeds up the sweep phase in the case most of the heap is empty
by keeping track of the ID of the highest block we allocated in an area
since the last GC.

The performance benchmark run on PYBV10 shows between +0 and +2%
improvement across the existing performance tests.  These tests don't
really stress the GC, so they were also run with gc.threshold(30000) and
gc.threshold(10000).  For the 30000 case, performance improved by up to
+10% with this commit.  For the 10000 case, performance improved by at
least +10% on 6 tests, and up to +25%.

Signed-off-by: Damien George <damien@micropython.org>
2023-08-04 17:25:16 +10:00
David Lechner
468ed218c9 py/gc: Make improvements to MICROPY_GC_HOOK_LOOP.
Changes in this commit:
- Add MICROPY_GC_HOOK_LOOP to gc_info() and gc_alloc().  Both of these can
  be long running (many milliseconds) which is too long to be blocking in
  some applications.
- Pass loop variable to MICROPY_GC_HOOK_LOOP(i) macro so that implementers
  can use it, e.g. to improve performance by only calling a function every
  X number of iterations.
- Drop outer call to MICROPY_GC_HOOK_LOOP in gc_mark_subtree().
2023-05-09 12:44:14 +10:00
Damien George
b1229efbd1 all: Fix spelling mistakes based on codespell check.
Signed-off-by: Damien George <damien@micropython.org>
2023-04-27 18:03:06 +10:00
Jim Mussared
256f47e2f8 extmod/btstack: Fix indicate/notify queuing.
This adds a mechanism to track a pending notify/indicate operation that
is deferred due to the send buffer being full. This uses a tracked alloc
that is passed as the content arg to the callback.

This replaces the previous mechanism that did this via the global pending
op queue, shared with client read/write ops.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2023-04-26 11:37:20 +10:00
Damien George
b3c8ab37ec py/gc: Make gc_dump_info/gc_dump_alloc_table take a printer as argument.
So that callers can redirect the output if needed.

Signed-off-by: Damien George <damien@micropython.org>
2023-03-10 10:58:10 +11:00
robert-hh
e78e0b7418 py/gc: Increase the address length in gc_dump_alloc_table().
Showing 8 digits instead of 5, supporting devices with more than 1 MByte of
RAM (which is common these days).  The masking was never needed, and the
related commented-out line can go.
2023-01-16 12:09:20 +11:00
Damien George
ab0258fb1e py/gc: Fix debug printing of GC layout.
Signed-off-by: Damien George <damien@micropython.org>
2022-12-08 14:36:34 +11:00
Jeff Epler
d75ff42297 unix/coverage: Add extra GC coverage test for ATB gap byte.
The assertion that is added here (to gc.c) fails when running this new test
if ALLOC_TABLE_GAP_BYTE is set to 0.

Signed-off-by: Jeff Epler <jepler@gmail.com>
Signed-off-by: Damien George <damien@micropython.org>
2022-12-08 14:35:08 +11:00
Jeff Epler
9f434dd8de py/gc: Ensure a gap of one byte after the ATB.
Prior to this fix the follow crash occurred.  With a GC layout of:

    GC layout:
      alloc table at 0x3fd80428, length 32001 bytes, 128004 blocks
      finaliser table at 0x3fd88129, length 16001 bytes, 128008 blocks
      pool at 0x3fd8bfc0, length 2048064 bytes, 128004 blocks

Block 128003 is an AT_HEAD and eventually is passed to gc_mark_subtree.
This causes gc_mark_subtree to call ATB_GET_KIND(128004).  When block 1 is
created with a finaliser, the first byte of the finaliser table becomes
0x2, but ATB_GET_KIND(128004) reads these bits as AT_TAIL, and then
gc_mark_subtree references past the end of the heap, which happened to be
past the end of PSRAM on the esp32-s2.

The fix in this commit is to ensure there is a one-byte gap after the ATB
filled permanently with AT_FREE.

Fixes issue #7116.

See also https://github.com/adafruit/circuitpython/issues/5021

Signed-off-by: Jeff Epler <jepler@gmail.com>
Signed-off-by: Damien George <damien@micropython.org>
2022-12-08 14:29:58 +11:00
Jeff Epler
84071590b3 py/gc: Avoid valgrind false positives.
When you want to use the valgrind memory analysis tool on MicroPython, you
can arrange to define MICROPY_DEBUG_VALGRIND to enable use of special
valgrind macros.  For now, this only fixes `gc_get_ptr` so that it never
emits the diagnostic "Conditional jump or move depends on uninitialised
value(s)".

Signed-off-by: Jeff Epler <jepler@gmail.com>
2022-12-08 14:29:22 +11:00
Rob Knegjens
4a48531803 py/gc: Reduce code size when MICROPY_GC_SPLIT_HEAP is disabled.
Use C macros to reduce the size of firmware images when the GC split-heap
feature is disabled.

The code size difference of this commit versus HEAD~2 (ie the commit prior
to MICROPY_GC_SPLIT_HEAP being introduced) when split-heap is disabled is:

       bare-arm:    +0 +0.000%
    minimal x86:    +0 +0.000%
       unix x64:   -16 -0.003%
    unix nanbox:   -20 -0.004%
          stm32:    -8 -0.002% PYBV10
         cc3200:    +0 +0.000%
        esp8266:    +8 +0.001% GENERIC
          esp32:    +0 +0.000% GENERIC
            nrf:   -20 -0.011% pca10040
            rp2:    +0 +0.000% PICO
           samd:    -4 -0.003% ADAFRUIT_ITSYBITSY_M4_EXPRESS

The code size difference of this commit versus HEAD~2 split-heap is enabled
with MICROPY_GC_MULTIHEAP=1 (but no extra code to add more heaps):

    unix x64: +1032 +0.197% [incl +544(bss)]
       esp32:  +592 +0.039% GENERIC[incl +16(data) +264(bss)]
2022-07-23 00:43:08 +10:00
Ayke van Laethem
bcc827d695 py/gc: Allow the GC heap to be split over multiple memory areas.
This commit adds a new option MICROPY_GC_SPLIT_HEAP (disabled by default)
which, when enabled, allows the GC heap to be split over multiple memory
areas/regions.  The first area is added with gc_init() and subsequent areas
can be added with gc_add().  New areas can be added at runtime.  Areas are
stored internally as a linked list, and calls to gc_alloc() can be
satisfied from any area.

This feature has the following use-cases (among others):
- The ESP32 has a fragmented OS heap, so to use all (or more) of it the
  GC heap must be split.
- Other MCUs may have disjoint RAM regions and are now able to use them
  all for the GC heap.
- The user could explicitly increase the size of the GC heap.
- Support a dynamic heap while running on an OS, adding more heap when
  necessary.
2022-07-23 00:42:54 +10:00
Damien George
18acd0318f py/gc: Update debug code to compile with changes to qstr pool types.
Following on from 18b1ba086c and
f46a7140f5.

Signed-off-by: Damien George <damien@micropython.org>
2022-02-17 11:17:21 +11:00
Laurens Valk
fe120484b6 py/gc: Add hook to run code during time consuming GC operations.
This makes it possible for cooperative multitasking systems to keep running
event loops during garbage collector operations.

For example, this can be used to ensure that a motor control loop runs
approximately each 5 ms.  Without this hook, the loop time can jump to
about 15 ms.

Addresses #3475.

Signed-off-by: Laurens Valk <laurens@pybricks.com>
2021-11-01 15:39:37 +11:00
Damien George
bc89cdeb45 py/gc: Only use no_sanitize_address attribute for GCC 4.8 and above.
It's not supported on older GCC versions.

Signed-off-by: Damien George <damien@micropython.org>
2021-06-18 14:15:37 +10:00
Jeff Epler
9a74546f8d py/gc: Access the list of root pointers in an asan-compatible way.
Signed-off-by: Jeff Epler <jepler@gmail.com>
2021-05-30 11:50:51 +10:00
Damien George
b6b39bff47 py/gc: Make gc_lock_depth have a count per thread.
This commit makes gc_lock_depth have one counter per thread, instead of one
global counter.  This makes threads properly independent with respect to
the GC, in particular threads can now independently lock the GC for
themselves without locking it for other threads.  It also means a given
thread can run a hard IRQ without temporarily locking the GC for all other
threads and potentially making them have MemoryError exceptions at random
locations (this really only occurs on MCUs with multiple cores and no GIL,
eg on the rp2 port).

The commit also removes protection of the GC lock/unlock functions, which
is no longer needed when the counter is per thread (and this also fixes the
cas where a hard IRQ calling gc_lock() may stall waiting for the mutex).

It also puts the check for `gc_lock_depth > 0` outside the GC mutex in
gc_alloc, gc_realloc and gc_free, to potentially prevent a hard IRQ from
waiting on a mutex if it does attempt to allocate heap memory (and putting
the check outside the GC mutex is now safe now that there is a
gc_lock_depth per thread).

Signed-off-by: Damien George <damien@micropython.org>
2021-05-10 13:07:16 +10:00
Damien George
ad4656b861 all: Rename BYTES_PER_WORD to MP_BYTES_PER_OBJ_WORD.
The "word" referred to by BYTES_PER_WORD is actually the size of mp_obj_t
which is not always the same as the size of a pointer on the target
architecture.  So rename this config value to better reflect what it
measures, and also prefix it with MP_.

For uses of BYTES_PER_WORD in setting the stack limit this has been
changed to sizeof(void *), because the stack usually grows with
machine-word sized values (eg an nlr_buf_t has many machine words in it).

Signed-off-by: Damien George <damien@micropython.org>
2021-02-04 22:46:42 +11:00
Damien George
7e956fae28 py: Rename BITS_PER_BYTE to MP_BITS_PER_BYTE.
To give this macro a standard MP_ prefix.

Signed-off-by: Damien George <damien@micropython.org>
2021-02-04 22:46:42 +11:00
stijn
cb8e2f02ab py/gc: Fix debug printing of pointer.
When DEBUG_printf is the standard printf, compilers require the value for
%p to be an actual pointer instead of an integer.
2021-01-30 14:41:29 +11:00
Emil Renner Berthing
ccd92335a1 py, extmod: Introduce and use MP_FALLTHROUGH macro.
Newer GCC versions are able to warn about switch cases that fall
through.  This is usually a sign of a forgotten break statement, but in
the few cases where a fall through is intended we annotate it with this
macro to avoid the warning.
2020-10-22 11:53:16 +02:00
stijn
84fa3312cf all: Format code to add space after C++-style comment start.
Note: the uncrustify configuration is explicitly set to 'add' instead of
'force' in order not to alter the comments which use extra spaces after //
as a means of indenting text for clarity.
2020-04-23 11:24:25 +10:00
Damien George
69661f3343 all: Reformat C and Python source code with tools/codeformat.py.
This is run with uncrustify 0.70.1, and black 19.10b0.
2020-02-28 10:33:03 +11:00
Damien George
3f39d18c2b all: Add *FORMAT-OFF* in various places.
This string is recognised by uncrustify, to disable formatting in the
region marked by these comments.  This is necessary in the qstrdef*.h files
to prevent modification of the strings within the Q(...).  In other places
it is used to prevent excessive reformatting that would make the code less
readable.
2020-02-28 10:31:07 +11:00
David Lechner
ccc18f047d py/gc: Don't include or init gc_mutex when GIL is enabled.
When threads and the GIL are enabled, then the GC mutex is not needed.  The
gc_mutex field is never used in this case because of:

    #if MICROPY_PY_THREAD && !MICROPY_PY_THREAD_GIL
    #define GC_ENTER() mp_thread_mutex_lock(&MP_STATE_MEM(gc_mutex), 1)
    #define GC_EXIT() mp_thread_mutex_unlock(&MP_STATE_MEM(gc_mutex))
    #else
    #define GC_ENTER()
    #define GC_EXIT()
    #endif

So, we can completely remove gc_mutex everywhere when MICROPY_PY_THREAD
&& !MICROPY_PY_THREAD_GIL.
2020-01-23 13:28:42 +11:00
Paul Sokolovsky
016d9a40fe various: Add and update my copyright line based on git history.
For modules I initially created or made substantial contributions to.
2019-05-17 18:04:15 +10:00
Paul Sokolovsky
5ed578e5b4 py/gc: Adjust gc_alloc() signature to be able to accept multiple flags.
The older "bool has_finaliser" gets recast as GC_ALLOC_FLAG_HAS_FINALISER=1
so this is a backwards compatible change to the signature.  Since bool gets
implicitly converted to 1 this patch doesn't include conversion of all
calls.
2018-12-20 17:52:16 +11:00
Damien George
91041945c9 py/gc: In gc_alloc, reset n_free var right before search for free mem.
Otherwise there is the possibility that n_free starts out non-zero from the
previous iteration, which may have found a few (but not enough) free blocks
at the end of the heap.  If this is the case, and if the very first blocks
that are scanned the second time around (starting at
gc_last_free_atb_index) are found to give enough memory (including the
blocks at the end of the heap from the previous iteration that left n_free
non-zero) then memory will be allocated starting before the location that
gc_last_free_atb_index points to, most likely leading to corruption.

This serious bug did not manifest itself in the past because a gc_collect
always resets gc_last_free_atb_index to point to the start of the GC heap,
and the first block there is almost always allocated to a long-lived
object (eg entries from sys.path, or mounted filesystem objects), which
means that n_free would be reset at the start of the search loop.

But with threading enabled with the GIL disabled it is possible to trigger
the bug via the following sequence of events:

1. Thread A runs gc_alloc, fails to find enough memory, and has a non-zero
   n_free at the end of the search.
2. Thread A calls gc_collect and frees a bunch of blocks on the GC heap.
3. Just after gc_collect finishes in thread A, thread B takes gc_mutex and
   does an allocation, moving gc_last_free_atb_index to point to the
   interior of the heap, to a place where there is most likely a run of
   available blocks.
4. Thread A regains gc_mutex and does its second search for free memory,
   starting with a non-zero n_free.  Since it's likely that the first block
   it searches is available it will allocate memory which overlaps with the
   memory before gc_last_free_atb_index.
2018-08-14 16:11:21 +10:00
Damien George
b630dfcc1d py: Fix compiling with debug enabled and make more use of DEBUG_printf.
DEBUG_printf and MICROPY_DEBUG_PRINTER is now used instead of normal
printf, and a fault is fixed in mp_obj_class_lookup with debugging enabled;
see issue #3999.  Debugging can now be enabled on all ports including when
nan-boxing is used.
2018-08-02 14:17:24 +10:00
Damien George
522ea80f06 py/gc: Add gc_sweep_all() function to run all remaining finalisers.
This patch adds the gc_sweep_all() function which does a garbage collection
without tracing any root pointers, so frees all the memory, and most
importantly runs any remaining finalisers.

This helps primarily for soft reset: it will close any open files, any open
sockets, and help to get the system back to a clean state upon soft reset.
2018-06-12 11:55:29 +10:00
Damien George
6bd78741c1 py/gc: When GC threshold is hit don't unnecessarily collect twice.
Without this, if GC threshold is hit and there is not enough memory left to
satisfy the request, gc_collect() will run a second time and the search for
memory will happen again and will fail again.

Thanks to @adritium for pointing out this issue, see #3786.
2018-05-21 13:36:21 +10:00
Damien George
749b16174b py/mpstate.h: Adjust start of root pointer section to exclude non-ptrs.
This patch moves the start of the root pointer section in mp_state_ctx_t
so that it skips entries that are not pointers and don't need scanning.

Previously, the start of the root pointer section was at the very beginning
of the mp_state_ctx_t struct (which is the beginning of mp_state_thread_t).
This was the original assembler version of the NLR code was hard-coded to
have the nlr_top pointer at the start of this state structure.  But now
that the NLR code is partially written in C there is no longer this
restriction on the location of nlr_top (and a comment to this effect has
been removed in this patch).

So now the root pointer section starts part way through the
mp_state_thread_t structure, after the entries which are not root pointers.

This patch also moves the non-pointer entries for MICROPY_ENABLE_SCHEDULER
outside the root pointer section.

Moving non-pointer entries out of the root pointer section helps to make
the GC more precise and should help to prevent some cases of collectable
garbage being kept.

This patch also has a measurable improvement in performance of the
pystone.py benchmark: on unix x86-64 and stm32 there was an improvement of
roughly 0.6% (tested with both gcc 7.3 and gcc 8.1).
2018-05-13 22:53:28 +10:00
Damien George
2a0cbc0d38 py/gc: Update comment now that gc_drain_stack is called gc_mark_subtree. 2018-02-19 16:08:20 +11:00
Ayke van Laethem
736faef223 py/gc: Make GC stack pointer a local variable.
This saves a bit in code size, and saves some precious .bss RAM:

                 .text  .bss
minimal CROSS=1: -28    -4
unix (64-bit):   -64    -8
2018-02-19 16:05:46 +11:00
Ayke van Laethem
5c9e5618e0 py/gc: Rename gc_drain_stack to gc_mark_subtree and pass it first block.
This saves a bit in code size:

minimal CROSS=1: -44
unix:            -96
2018-02-19 16:00:59 +11:00
Ayke van Laethem
ea7cf2b738 py/gc: Reduce code size by specialising VERIFY_MARK_AND_PUSH macro.
This macro is written out explicitly in the two locations that it is used
and then the code is optimised, opening possibilities for further
optimisations and reducing code size:

unix:            -48
minimal CROSS=1: -32
stm32:           -32
2018-02-19 15:58:49 +11:00
Damien George
02d830c035 py: Introduce a Python stack for scoped allocation.
This patch introduces the MICROPY_ENABLE_PYSTACK option (disabled by
default) which enables a "Python stack" that allows to allocate and free
memory in a scoped, or Last-In-First-Out (LIFO) way, similar to alloca().

A new memory allocation API is introduced along with this Py-stack.  It
includes both "local" and "nonlocal" LIFO allocation.  Local allocation is
intended to be equivalent to using alloca(), whereby the same function must
free the memory.  Nonlocal allocation is where another function may free
the memory, so long as it's still LIFO.

Follow-up patches will convert all uses of alloca() and VLA to the new
scoped allocation API.  The old behaviour (using alloca()) will still be
available, but when MICROPY_ENABLE_PYSTACK is enabled then alloca() is no
longer required or used.

The benefits of enabling this option are (or will be once subsequent
patches are made to convert alloca()/VLA):
- Toolchains without alloca() can use this feature to obtain correct and
  efficient scoped memory allocation (compared to using the heap instead
  of alloca(), which is slower).
- Even if alloca() is available, enabling the Py-stack gives slightly more
  efficient use of stack space when calling nested Python functions, due to
  the way that compilers implement alloca().
- Enabling the Py-stack with the stackless mode allows for even more
  efficient stack usage, as well as retaining high performance (because the
  heap is no longer used to build and destroy stackless code states).
- With Py-stack and stackless enabled, Python-calling-Python is no longer
  recursive in the C mp_execute_bytecode function.

The micropython.pystack_use() function is included to measure usage of the
Python stack.
2017-12-11 13:49:09 +11:00
Paul Sokolovsky
dea3fb93c7 py/gc: In sweep debug output, print pointer as a pointer.
Or it will be truncated on a 64-bit platform.
2017-12-09 01:54:01 +02:00
Paul Sokolovsky
5453d88d5d py/gc: Factor out a macro to trace GC mark operations.
To allow easier override it for custom tracing.
2017-12-09 01:48:26 +02:00
Paul Sokolovsky
9ef4be8b41 py/gc: Add CLEAR_ON_SWEEP option to debug mis-traced objects.
Accessing them will crash immediately instead still working for some time,
until overwritten by some other data, leading to much less deterministic
crashes.
2017-12-08 00:10:44 +02:00
Damien George
74fad3536b py/gc: In gc_realloc, convert pointer sanity checks to assertions.
These checks are assumed to be true in all cases where gc_realloc is
called with a valid pointer, so no need to waste code space and time
checking them in a non-debug build.
2017-11-29 17:17:08 +11:00
Damien George
a3dc1b1957 all: Remove inclusion of internal py header files.
Header files that are considered internal to the py core and should not
normally be included directly are:
    py/nlr.h - internal nlr configuration and declarations
    py/bc0.h - contains bytecode macro definitions
    py/runtime0.h - contains basic runtime enums

Instead, the top-level header files to include are one of:
    py/obj.h - includes runtime0.h and defines everything to use the
        mp_obj_t type
    py/runtime.h - includes mpstate.h and hence nlr.h, obj.h, runtime0.h,
        and defines everything to use the general runtime support functions

Additional, specific headers (eg py/objlist.h) can be included if needed.
2017-10-04 12:37:50 +11:00
Stefan Naumann
ace9fb5405 py: Add verbose debug compile-time flag MICROPY_DEBUG_VERBOSE.
It enables all the DEBUG_printf outputs in the py/ source code.
2017-08-15 11:53:36 +10:00
Alexander Steffen
55f33240f3 all: Use the name MicroPython consistently in comments
There were several different spellings of MicroPython present in comments,
when there should be only one.
2017-07-31 18:35:40 +10:00
Damien George
12d4fa9b37 py/gc: Refactor assertions in gc_free function.
gc_free() expects either NULL or a valid pointer into the heap, so the
checks for a valid pointer can be turned into assertions.
2017-07-12 12:17:38 +10:00
Damien George
c7e8c6f7de py/gc: Execute finaliser code in a protected environment.
If a finaliser raises an exception then it must not propagate through the
GC sweep function.  This patch protects against such a thing by running
finaliser code via the mp_call_function_1_protected call.

This patch also adds scheduler lock/unlock calls around the finaliser
execution to further protect against any possible reentrancy issues: the
memory manager is already locked when doing a collection, but we also don't
want to allow any scheduled code to run, KeyboardInterrupts to interupt the
code, nor threads to switch.
2017-04-12 13:52:04 +10:00
Damien George
5ffe1d8dc0 py/gc: Add MICROPY_GC_CONSERVATIVE_CLEAR option to always zero memory.
There can be stray pointers in memory blocks that are not properly zero'd
after allocation.  This patch adds a new config option to always zero all
allocated memory (via gc_alloc and gc_realloc) and hence help to eliminate
stray pointers.

See issue #2195.
2016-08-26 15:35:26 +10:00