Is there some way to improve reciprocal (division 1 over X) with respect to speed, if the precision is not crucial?
So, I need to calculate 1/X. Is there so
This should do it with a number of pre-unrolled newton iterations's evaluated as a Horner polynomial which uses fused-multiply accumulate operations most modern day CPU's execute in a single Clk cycle (every time):
float inv_fast(float x) {
union { float f; int i; } v;
float w, sx;
int m;
sx = (x < 0) ? -1:1;
x = sx * x;
v.i = (int)(0x7EF127EA - *(uint32_t *)&x);
w = x * v.f;
// Efficient Iterative Approximation Improvement in horner polynomial form.
v.f = v.f * (2 - w); // Single iteration, Err = -3.36e-3 * 2^(-flr(log2(x)))
// v.f = v.f * ( 4 + w * (-6 + w * (4 - w))); // Second iteration, Err = -1.13e-5 * 2^(-flr(log2(x)))
// v.f = v.f * (8 + w * (-28 + w * (56 + w * (-70 + w *(56 + w * (-28 + w * (8 - w))))))); // Third Iteration, Err = +-6.8e-8 * 2^(-flr(log2(x)))
return v.f * sx;
}
Fine Print: Closer to 0, the approximation does not do so well so either you the programmer needs to test the performance or restrict the input from getting to low before resorting to hardware division. i.e. be responsible!