Identical to cd527bb324 but for doubles. This gives a -2.754% improvement on bm_float.py, and -35% improvement on calling sqrt in a loop.
cd527bb324