division as multiply and LUT ? / fast float division reciprocal

问题

Is it possible to make a reciprocal of float division in form of look up table (such like 1/f -> 1*inv[f] ) ? How it could be done? I think some and mask and shift should be appled to float to make it a form of index? How would be it exectly?

回答1:

You can guess an approximate inverse like this:

int x = bit_cast<int>(f);
x = 0x7EEEEEEE - x;
float inv = bit_cast<float>(x);

In my tests, 0x7EF19D07 was slightly better (tested with the effects of 2 Newton-Raphson refinements included).

Which you can then improve with Newton-Raphson:

inv = inv * (2 - inv * f);

Iterate as often as you want. 2 or 3 iterations give nice results.

Better Initial Approximations

To minimize the relative error:

0x7EF311C2 (without refinement)
0x7EF311C3 (1 refinement)
0x7EF312AC (2 refinements)
0x7EEEEBB3 (3 refinements)

To minimize the absolute error for inputs between 1 and 2 (they work well enough outside that range, but they may not be the best):

0x7EF504F3 (without refinement)
0x7EF40D2F (1 refinement)
0x7EF39252 (2 refinements)

For three refinement steps, the initial approximation barely affects the maximum relative error. 0x7EEEEEEE works great, and I couldn't find anything better.

回答2:

One method is:

Extract the sign, exponent and mantissa from the input
Use some of the most significant mantissa bits to look up its reciprocal in a table
Negate the exponent, and adjust for the change of scale of the mantissa
Recombine the sign, exponent and mantissa to form the output

In step 2, you'll need to choose the number of bits to use, trading between accuracy and table size. You could obtain more accuracy by using the less significant bits to interpolate between table entries.

In step 3, the adjustment is necessary because the input mantissa was in the range (0.5, 1.0], and so its reciprocal is in the range [1.0, 2.0), which needs renormalising to give the output mantissa.

I won't try to write the code for this, since there are probably some slightly fiddly edge cases that I'd miss.

You should also investigate methods involving numerical calculations, which might give better results if memory access is slow; on a modern PC architecture, a cache miss might be as expensive as dozens of arithmetic operations. Wikipedia looks like a good starting point. And of course, whatever you do, measure it to make sure it is actually faster than an FPU division operation.

回答3:

If your minimum step is something like 0.01 then you can support inverse-f from a table. Each index is multiplied by 100 so you can have

table[1]----->1.0/0.01
table[3]----->1.0/0.03
table[105]--->1.0/1.05
...
table[10000]->1.0/100.0


10000 elements for a range of (0.00,100.00)

If you want better precision, you will need more ram.

Another example:

range................: 0.000 - 1000.000
minimum increments ..: 0.001
total element number.: 1 million

something like this: table[2343]=1.0/2.343

Another example:

range................: 0.000000 - 1.000000
minimum increments ..: 0.000001
total element number.: 1 million

something like this: table[999999]=1.0/0.999999

来源：https://stackoverflow.com/questions/12227126/division-as-multiply-and-lut-fast-float-division-reciprocal

标签

c++

optimization

fpu