This blends two "565"-format bitmaps, including byteswapped ones. All
the bitmaps have to have the same memory format.
The routine takes about 63ms on a Kaluga when operating on 320x240 bitmaps.
Of course, displaying the bitmap also takes time.
There's untested code for the L8 (8-bit greyscale) case. This can be
enabled once gifio is merged.
When reading uncompressed bitmap data directly, readinto can work
much more quickly than a Python-coded loop.
On a Raspberry Pi Pico, I benchmarked a modified version of
adafruit_bitmap_font's pcf reader which uses readinto instead of
the existing code. My test font was a 72-point file created from Arial.
This decreased the time to load all the ASCII glyphs from 4.9 seconds to
just 0.44 seconds.
While this attempts to support many pixel configurations (1/2/4/8/16/24/32
bpp; swapped words and pixels) only the single combination used by
PCF fonts was tested.