docs/library/deflate: Add docs for deflate.DeflateIO.

Also update zlib & gzip docs to describe the micropython-lib modules.

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This commit is contained in:
Jim Mussared 2023-06-27 02:17:41 +10:00 committed by Damien George
parent 8b315ef0d8
commit b804443cb3
4 changed files with 357 additions and 28 deletions

177
docs/library/deflate.rst Normal file
View File

@ -0,0 +1,177 @@
:mod:`deflate` -- deflate compression & decompression
=====================================================
.. module:: deflate
:synopsis: deflate compression & decompression
This module allows compression and decompression of binary data with the
`DEFLATE algorithm <https://en.wikipedia.org/wiki/DEFLATE>`_
(commonly used in the zlib library and gzip archiver).
**Availability:**
* Added in MicroPython v1.21.
* Decompression: Enabled via the ``MICROPY_PY_DEFLATE`` build option, on by default
on ports with the "extra features" level or higher (which is most boards).
* Compression: Enabled via the ``MICROPY_PY_DEFLATE_COMPRESS`` build option, on
by default on ports with the "full features" level or higher (generally this means
you need to build your own firmware to enable this).
Classes
-------
.. class:: DeflateIO(stream, format=AUTO, wbits=0, close=False, /)
This class can be used to wrap a *stream* which is any
:term:`stream-like <stream>` object such as a file, socket, or stream
(including :class:`io.BytesIO`). It is itself a stream and implements the
standard read/readinto/write/close methods.
The *stream* must be a blocking stream. Non-blocking streams are currently
not supported.
The *format* can be set to any of the constants defined below, and defaults
to ``AUTO`` which for decompressing will auto-detect gzip or zlib streams,
and for compressing it will generate a raw stream.
The *wbits* parameter sets the base-2 logarithm of the DEFLATE dictionary
window size. So for example, setting *wbits* to ``10`` sets the window size
to 1024 bytes. Valid values are ``5`` to ``15`` inclusive (corresponding to
window sizes of 32 to 32k bytes).
If *wbits* is set to ``0`` (the default), then a window size of 256 bytes
will be used (corresponding to *wbits* set to ``8``), except when
:ref:`decompressing a zlib stream <deflate_wbits_zlib>`.
See the :ref:`window size <deflate_wbits>` notes below for more information
about the window size, zlib, and gzip streams.
If *close* is set to ``True`` then the underlying stream will be closed
automatically when the :class:`deflate.DeflateIO` stream is closed. This is
useful if you want to return a :class:`deflate.DeflateIO` stream that wraps
another stream and not have the caller need to know about managing the
underlying stream.
If compression is enabled, a given :class:`deflate.DeflateIO` instance
supports both reading and writing. For example, a bidirectional stream like
a socket can be wrapped, which allows for compression/decompression in both
directions.
Constants
---------
.. data:: deflate.AUTO
deflate.RAW
deflate.ZLIB
deflate.GZIP
Supported values for the *format* parameter.
Examples
--------
A typical use case for :class:`deflate.DeflateIO` is to read or write a compressed
file from storage:
.. code:: python
import deflate
# Writing a zlib-compressed stream (uses the default window size of 256 bytes).
with open("data.gz", "wb") as f:
with deflate.DeflateIO(f, deflate.ZLIB) as d:
# Use d.write(...) etc
# Reading a zlib-compressed stream (auto-detect window size).
with open("data.z", "rb") as f:
with deflate.DeflateIO(f, deflate.ZLIB) as d:
# Use d.read(), d.readinto(), etc.
Because :class:`deflate.DeflateIO` is a stream, it can be used for example
with :meth:`json.dump` and :meth:`json.load` (and any other places streams can
be used):
.. code:: python
import deflate, json
# Write a dictionary as JSON in gzip format, with a
# small (64 byte) window size.
config = { ... }
with open("config.gz", "wb") as f:
with deflate.DeflateIO(f, deflate.GZIP, 6) as f:
json.dump(config, f)
# Read back that dictionary.
with open("config.gz", "rb") as f:
with deflate.DeflateIO(f, deflate.GZIP, 6) as f:
config = json.load(f)
If your source data is not in a stream format, you can use :class:`io.BytesIO`
to turn it into a stream suitable for use with :class:`deflate.DeflateIO`:
.. code:: python
import deflate, io
# Decompress a bytes/bytearray value.
compressed_data = get_data_z()
with deflate.DeflateIO(io.BytesIO(compressed_data), deflate.ZLIB) as d:
decompressed_data = d.read()
# Compress a bytes/bytearray value.
uncompressed_data = get_data()
stream = io.BytesIO()
with deflate.DeflateIO(stream, deflate.ZLIB) as d:
d.write(uncompressed_data)
compressed_data = stream.getvalue()
.. _deflate_wbits:
Deflate window size
-------------------
The window size limits how far back in the stream the (de)compressor can
reference. Increasing the window size will improve compression, but will
require more memory.
However, just because a given window size is used for compression, this does not
mean that the stream will require the same size window for decompression, as
the stream may not reference data as far back as the window allows (for example,
if the length of the input is smaller than the window size).
If the decompressor uses a smaller window size than necessary for the input data
stream, it will fail mid-way through decompression with :exc:`OSError`.
.. _deflate_wbits_zlib:
The zlib format includes a header which specifies the window size used to
compress the data (which due to the above, may be larger than the size required
for the decompressor).
If this header value is lower than the specified *wbits* value, then the header
value will be used instead in order to reduce the memory allocation size. If
the *wbits* parameter is zero (the default), then the header value will only be
used if it is less than the maximum value of ``15`` (which is default value
used by most compressors [#f1]_).
In other words, if the source zlib stream has been compressed with a custom window
size (i.e. less than ``15``), then using the default *wbits* parameter of zero
will decompress any such stream.
The gzip file format does not include the window size in the header.
Additionally, most compressor libraries (including CPython's implementation
of :class:`gzip.GzipFile`) will default to the maximum possible window size.
This makes it difficult to decompress most gzip streams on MicroPython unless
your board has a lot of free RAM.
If you control the source of the compressed data, then prefer to use the zlib
format, with a window size that is suitable for your target device.
.. rubric:: Footnotes
.. [#f1] The assumption here is that if the header value is the default used by
most compressors, then nothing is known about the likely required window
size and we should ignore it.

106
docs/library/gzip.rst Normal file
View File

@ -0,0 +1,106 @@
:mod:`gzip` -- gzip compression & decompression
===============================================
.. module:: gzip
:synopsis: gzip compression & decompression
|see_cpython_module| :mod:`python:gzip`.
This module allows compression and decompression of binary data with the
`DEFLATE algorithm <https://en.wikipedia.org/wiki/DEFLATE>`_ used by the gzip
file format.
.. note:: Prefer to use :class:`deflate.DeflateIO` instead of the functions in this
module as it provides a streaming interface to compression and decompression
which is convenient and more memory efficient when working with reading or
writing compressed data to a file, socket, or stream.
**Availability:**
* This module is **not present by default** in official MicroPython firmware
releases as it duplicates functionality available in the :mod:`deflate
<deflate>` module.
* A copy of this module can be installed (or frozen)
from :term:`micropython-lib` (`source <https://github.com/micropython/micropython-lib/blob/master/python-stdlib/gzip/gzip.py>`_).
See :ref:`packages` for more information. This documentation describes that module.
* Compression support will only be available if compression support is enabled
in the built-in :mod:`deflate <deflate>` module.
Functions
---------
.. function:: open(filename, mode, /)
Wrapper around built-in :func:`open` returning a GzipFile instance.
.. function:: decompress(data, /)
Decompresses *data* into a bytes object.
.. function:: compress(data, /)
Compresses *data* into a bytes object.
Classes
-------
.. class:: GzipFile(*, fileobj, mode)
This class can be used to wrap a *fileobj* which is any
:term:`stream-like <stream>` object such as a file, socket, or stream
(including :class:`io.BytesIO`). It is itself a stream and implements the
standard read/readinto/write/close methods.
When the *mode* argument is ``"rb"``, reads from the GzipFile instance will
decompress the data in the underlying stream and return decompressed data.
If compression support is enabled then the *mode* argument can be set to
``"wb"``, and writes to the GzipFile instance will be compressed and written
to the underlying stream.
By default the GzipFile class will read and write data using the gzip file
format, including a header and footer with checksum and a window size of 512
bytes.
The **file**, **compresslevel**, and **mtime** arguments are not
supported. **fileobj** and **mode** must always be specified as keyword
arguments.
Examples
--------
A typical use case for :class:`gzip.GzipFile` is to read or write a compressed
file from storage:
.. code:: python
import gzip
# Reading:
with open("data.gz", "rb") as f:
with gzip.GzipFile(fileobj=f, mode="rb") as g:
# Use g.read(), g.readinto(), etc.
# Same, but using gzip.open:
with gzip.open("data.gz", "rb") as f:
# Use f.read(), f.readinto(), etc.
# Writing:
with open("data.gz", "wb") as f:
with gzip.GzipFile(fileobj=f, mode="wb") as g:
# Use g.write(...) etc
# Same, but using gzip.open:
with gzip.open("data.gz", "wb") as f:
# Use f.write(...) etc
# Write a dictionary as JSON in gzip format, with a
# small (64 byte) window size.
config = { ... }
with gzip.open("config.gz", "wb") as f:
json.dump(config, f)
For guidance on working with gzip sources and choosing the window size see the
note at the :ref:`end of the deflate documentation <deflate_wbits>`.

View File

@ -64,6 +64,7 @@ library.
collections.rst collections.rst
errno.rst errno.rst
gc.rst gc.rst
gzip.rst
hashlib.rst hashlib.rst
heapq.rst heapq.rst
io.rst io.rst
@ -95,6 +96,7 @@ the following libraries.
bluetooth.rst bluetooth.rst
btree.rst btree.rst
cryptolib.rst cryptolib.rst
deflate.rst
framebuf.rst framebuf.rst
machine.rst machine.rst
micropython.rst micropython.rst
@ -194,11 +196,11 @@ Extending built-in libraries from Python
A subset of the built-in modules are able to be extended by Python code by A subset of the built-in modules are able to be extended by Python code by
providing a module of the same name in the filesystem. This extensibility providing a module of the same name in the filesystem. This extensibility
applies to the following Python standard library modules which are built-in to applies to the following Python standard library modules which are built-in to
the firmware: ``array``, ``binascii``, ``collections``, ``errno``, ``hashlib``, the firmware: ``array``, ``binascii``, ``collections``, ``errno``, ``gzip``,
``heapq``, ``io``, ``json``, ``os``, ``platform``, ``random``, ``re``, ``hashlib``, ``heapq``, ``io``, ``json``, ``os``, ``platform``, ``random``,
``select``, ``socket``, ``ssl``, ``struct``, ``time`` ``zlib``, as well as the ``re``, ``select``, ``socket``, ``ssl``, ``struct``, ``time`` ``zlib``, as well
MicroPython-specific ``machine`` module. All other built-in modules cannot be as the MicroPython-specific ``machine`` module. All other built-in modules
extended from the filesystem. cannot be extended from the filesystem.
This allows the user to provide an extended implementation of a built-in library This allows the user to provide an extended implementation of a built-in library
(perhaps to provide additional CPython compatibility or missing functionality). (perhaps to provide additional CPython compatibility or missing functionality).

View File

@ -1,38 +1,82 @@
:mod:`zlib` -- zlib decompression :mod:`zlib` -- zlib compression & decompression
================================= ===============================================
.. module:: zlib .. module:: zlib
:synopsis: zlib decompression :synopsis: zlib compression & decompression
|see_cpython_module| :mod:`python:zlib`. |see_cpython_module| :mod:`python:zlib`.
This module allows to decompress binary data compressed with This module allows compression and decompression of binary data with the
`DEFLATE algorithm <https://en.wikipedia.org/wiki/DEFLATE>`_ `DEFLATE algorithm <https://en.wikipedia.org/wiki/DEFLATE>`_
(commonly used in zlib library and gzip archiver). Compression (commonly used in the zlib library and gzip archiver).
is not yet implemented.
.. note:: Prefer to use :class:`deflate.DeflateIO` instead of the functions in this
module as it provides a streaming interface to compression and decompression
which is convenient and more memory efficient when working with reading or
writing compressed data to a file, socket, or stream.
**Availability:**
* From MicroPython v1.21 onwards, this module may not be present by default on
all MicroPython firmware as it duplicates functionality available in
the :mod:`deflate <deflate>` module.
* A copy of this module can be installed (or frozen)
from :term:`micropython-lib` (`source <https://github.com/micropython/micropython-lib/blob/master/python-stdlib/zlib/zlib.py>`_).
See :ref:`packages` for more information. This documentation describes that module.
* Requires the built-in :mod:`deflate <deflate>` module (available since MicroPython v1.21)
* Compression support will only be available if compression support is enabled
in the built-in :mod:`deflate <deflate>` module.
Functions Functions
--------- ---------
.. function:: decompress(data, wbits=0, bufsize=0, /) .. function:: decompress(data, wbits=15, /)
Return decompressed *data* as bytes. *wbits* is DEFLATE dictionary window Decompresses *data* into a bytes object.
size used during compression (8-15, the dictionary size is power of 2 of
that value). Additionally, if value is positive, *data* is assumed to be
zlib stream (with zlib header). Otherwise, if it's negative, it's assumed
to be raw DEFLATE stream. *bufsize* parameter is for compatibility with
CPython and is ignored.
.. class:: DecompIO(stream, wbits=0, /) The *wbits* parameter works the same way as for :meth:`zlib.compress`
with the following additional valid values:
Create a `stream` wrapper which allows transparent decompression of * ``0``: Automatically determine the window size from the zlib header
compressed data in another *stream*. This allows to process compressed (*data* must be in zlib format).
streams with data larger than available heap size. In addition to * ``35`` to ``47``: Auto-detect either the zlib or gzip format.
values described in :func:`decompress`, *wbits* may take values
24..31 (16 + 8..15), meaning that input stream has gzip header.
.. admonition:: Difference to CPython As for :meth:`zlib.compress`, see the :mod:`CPython documentation for zlib <python:zlib>`
:class: attention for more information about the *wbits* parameter. As for :meth:`zlib.compress`,
MicroPython also supports smaller window sizes than CPython. See more
:ref:`MicroPython-specific details <deflate_wbits>` in the
:mod:`deflate <deflate>` module documentation.
This class is MicroPython extension. It's included on provisional If the data to be decompressed requires a larger window size, it will
basis and may be changed considerably or removed in later versions. fail during decompression.
.. function:: compress(data, wbits=15, /)
Compresses *data* into a bytes object.
*wbits* allows you to configure the DEFLATE dictionary window size and the
output format. The window size allows you to trade-off memory usage for
compression level. A larger window size will allow the compressor to
reference fragments further back in the input. The output formats are "raw"
DEFLATE (no header/footer), zlib, and gzip, where the latter two
include a header and checksum.
The low four bits of the absolute value of *wbits* set the base-2 logarithm of
the DEFLATE dictionary window size. So for example, ``wbits=10``,
``wbits=-10``, and ``wbits=26`` all set the window size to 1024 bytes. Valid
window sizes are ``5`` to ``15`` inclusive (corresponding to 32 to 32k bytes).
Negative values of *wbits* between ``-5`` and ``-15`` correspond to "raw"
output mode, positive values between ``5`` and ``15`` correspond to zlib
output mode, and positive values between ``21`` and ``31`` correspond to
gzip output mode.
See the :mod:`CPython documentation for zlib <python:zlib>` for more
information about the *wbits* parameter. Note that MicroPython allows
for smaller window sizes, which is useful when memory is constrained while
still achieving a reasonable level of compression. It also speeds up
the compressor. See more :ref:`MicroPython-specific details <deflate_wbits>`
in the :mod:`deflate <deflate>` module documentation.