4eaebc1988
This commit adds many new sections to the existing "Developing and building MicroPython" chapter to make it all about the internals of MicroPython. This work was done as part of Google's Season of Docs 2020.
142 lines
5.4 KiB
ReStructuredText
142 lines
5.4 KiB
ReStructuredText
.. _memorymanagement:
|
|
|
|
Memory Management
|
|
=================
|
|
|
|
Unlike programming languages such as C/C++, MicroPython hides memory management
|
|
details from the developer by supporting automatic memory management.
|
|
Automatic memory management is a technique used by operating systems or applications to automatically manage
|
|
the allocation and deallocation of memory. This eliminates challenges such as forgetting to
|
|
free the memory allocated to an object. Automatic memory management also avoids the critical issue of using memory
|
|
that is already released. Automatic memory management takes many forms, one of them being
|
|
garbage collection (GC).
|
|
|
|
The garbage collector usually has two responsibilities;
|
|
|
|
#. Allocate new objects in available memory.
|
|
#. Free unused memory.
|
|
|
|
There are many GC algorithms but MicroPython uses the
|
|
`Mark and Sweep <https://en.wikipedia.org/wiki/Tracing_garbage_collection#Basic_algorithm>`_
|
|
policy for managing memory. This algorithm has a mark phase that traverses the heap marking all
|
|
live objects while the sweep phase goes through the heap reclaiming all unmarked objects.
|
|
|
|
Garbage collection functionality in MicroPython is available through the ``gc`` built-in
|
|
module:
|
|
|
|
.. code-block:: bash
|
|
|
|
>>> x = 5
|
|
>>> x
|
|
5
|
|
>>> import gc
|
|
>>> gc.enable()
|
|
>>> gc.mem_alloc()
|
|
1312
|
|
>>> gc.mem_free()
|
|
2071392
|
|
>>> gc.collect()
|
|
19
|
|
>>> gc.disable()
|
|
>>>
|
|
|
|
Even when ``gc.disable()`` is invoked, collection can be triggered with ``gc.collect()``.
|
|
|
|
The object model
|
|
----------------
|
|
|
|
All MicroPython objects are referred to by the ``mp_obj_t`` data type.
|
|
This is usually word-sized (i.e. the same size as a pointer on the target architecture),
|
|
and can be typically 32-bit (STM32, nRF, ESP32, Unix x86) or 64-bit (Unix x64).
|
|
It can also be greater than a word-size for certain object representations, for
|
|
example ``OBJ_REPR_D`` has a 64-bit sized ``mp_obj_t`` on a 32-bit architecture.
|
|
|
|
An ``mp_obj_t`` represents a MicroPython object, for example an integer, float, type, dict or
|
|
class instance. Some objects, like booleans and small integers, have their value stored directly
|
|
in the ``mp_obj_t`` value and do not require additional memory. Other objects have their value
|
|
store elsewhere in memory (for example on the garbage-collected heap) and their ``mp_obj_t`` contains
|
|
a pointer to that memory. A portion of ``mp_obj_t`` is the tag which tells what type of object it is.
|
|
|
|
See ``py/mpconfig.h`` for the specific details of the available representations.
|
|
|
|
**Pointer tagging**
|
|
|
|
Because pointers are word-aligned, when they are stored in an ``mp_obj_t`` the
|
|
lower bits of this object handle will be zero. For example on a 32-bit architecture
|
|
the lower 2 bits will be zero:
|
|
|
|
``********|********|********|******00``
|
|
|
|
These bits are reserved for purposes of storing a tag. The tag stores extra information as
|
|
opposed to introducing a new field to store that information in the object, which may be
|
|
inefficient. In MicroPython the tag tells if we are dealing with a small integer, interned
|
|
(small) string or a concrete object, and different semantics apply to each of these.
|
|
|
|
For small integers the mapping is this:
|
|
|
|
``********|********|********|*******1``
|
|
|
|
Where the asterisks hold the actual integer value. For an interned string or an immediate
|
|
object (e.g. ``True``) the layout of the ``mp_obj_t`` value is, respectively:
|
|
|
|
``********|********|********|*****010``
|
|
|
|
``********|********|********|*****110``
|
|
|
|
While a concrete object that is none of the above takes the form:
|
|
|
|
``********|********|********|******00``
|
|
|
|
The stars here correspond to the address of the concrete object in memory.
|
|
|
|
Allocation of objects
|
|
----------------------
|
|
|
|
The value of a small integer is stored directly in the ``mp_obj_t`` and will be
|
|
allocated in-place, not on the heap or elsewhere. As such, creation of small
|
|
integers does not affect the heap. Similarly for interned strings that already have
|
|
their textual data stored elsewhere, and immediate values like ``None``, ``False``
|
|
and ``True``.
|
|
|
|
Everything else which is a concrete object is allocated on the heap and its object structure is such that
|
|
a field is reserved in the object header to store the type of the object.
|
|
|
|
.. code-block:: bash
|
|
|
|
+++++++++++
|
|
+ +
|
|
+ type + object header
|
|
+ +
|
|
+++++++++++
|
|
+ + object items
|
|
+ +
|
|
+ +
|
|
+++++++++++
|
|
|
|
The heap's smallest unit of allocation is a block, which is four machine words in
|
|
size (16 bytes on a 32-bit machine, 32 bytes on a 64-bit machine).
|
|
Another structure also allocated on the heap tracks the allocation of
|
|
objects in each block. This structure is called a *bitmap*.
|
|
|
|
.. image:: img/bitmap.png
|
|
|
|
The bitmap tracks whether a block is "free" or "in use" and use two bits to track this state
|
|
for each block.
|
|
|
|
The mark-sweep garbage collector manages the objects allocated on the heap, and also
|
|
utilises the bitmap to mark objects that are still in use.
|
|
See `py/gc.c <https://github.com/micropython/micropython/blob/master/py/gc.c>`_
|
|
for the full implementation of these details.
|
|
|
|
**Allocation: heap layout**
|
|
|
|
The heap is arranged such that it consists of blocks in pools. A block
|
|
can have different properties:
|
|
|
|
- *ATB(allocation table byte):* If set, then the block is a normal block
|
|
- *FREE:* Free block
|
|
- *HEAD:* Head of a chain of blocks
|
|
- *TAIL:* In the tail of a chain of blocks
|
|
- *MARK :* Marked head block
|
|
- *FTB(finaliser table byte):* If set, then the block has a finaliser
|