The Reset_Handler needs to copy the data section and zero the BSS, and
these operations should be as optimised as possible to reduce start up
time. The versions provided in this patch are about 2x faster (on a Cortex
M4) than the previous implementations.