There were two main problems
- word_buffer was being filled as though with unsigned samples,
but during mixing all samples are kept in signed mode
- If the first buffer was stopped, the voices_active flag got set
anyway, even though the output buffer wasn't initialized yet,
so the samples were mixed with indeterminate data
We also cover the case where no buffer was playing, and ensure
the output buffer is filled.
This now works much better. Tested on neotrellis m4 playing back
4 mp3 streams at a time in signed-16, 22050Hz