* Comment on the reason for scaling by 256
* Divide by 256 instead of shifting
* fix a cast; eliminate an unneeded roundf() to get a few bytes code back
On the Pico, this increases the "fill rate" of
pixels[:] = newvalues
considerably. On a strip of 240 RGB LEDs, auto_write=False, the timings
are:
|| Brightness || Before || After || Improvement ||
|| 1.0 || 117 kpix/s || 307 kpix/s || 2.62x ||
|| 0.07 || 117 kpix/s || 273 kpix/s || 2.33x ||
It's worth noting that even the "before" rate is fast compared to the
time to transmit a single neopixel, but any time we can gain back
in the whole pipeline will let marginal animations work a little better.
To set all the pixels in this way and then show() gives a pleasant bump
to the framerate, from about 108Hz to 124Hz (1.15x)
The main source of speed-up is using integer math instead of floating
point math for the calculation of the post-scaled pixel values. A slight
secondary gain is achieved by avoiding the scaling altogether when
the scale factor is 1.0.
Because the math is not exactly the same, some scaled pixel values may
change by +- 1 RGBW "step". In practice, this is unlikely to matter.
The gains are bigger on the Pico and other M0 microcontrollers than M4
microcontrollers with floating point math in the hardware.
Happily, flash size is also improved a bit on the Pico build I did,
going from
> 542552 bytes used, 506024 bytes free in flash firmware space out of 1048576 bytes (1024.0kB).
to
> 542376 bytes used, 506200 bytes free in flash firmware space out of 1048576 bytes (1024.0kB).
When handling negative steps, start and stop need to be mp_int_t so they
can be checked against a potential negative value during the for loop
used to set the slice values.