* Increase colorspace conversion efficiency.
This not only avoids a function call, it avoids the time-consuming
switch statement in conver_pixel (replacing it with a single
conditional on the byteswap flag + accounting for BGR/RGB during
palette creation)
* Buffer all the bytes of a single frame together. By reducing
low level write calls we get a decent speed increase even though
it increases data-shuffling a bit.
Together with some other changes that enable "double buffered" camera
capture, this gets me to 8.8fps capturing QVGA (320x240) gifs and
11fps capturing 240x240 square gifs.