Efficient computation of 2**64 / divisor via fast floating-point reciprocal

安稳与你 提交于 2019-11-29 05:39:02

This solution combines two ideas:

  • You can convert to floating point by simply reinterpreting the bits as floating point and subtracting a constant, so long as the number is within a particular range. So add a constant, reinterpret, and then subtract that constant. This will give a truncated result (which is therefore always less than or equal the desired value).
  • You can approximate reciprocal by negating both the exponent and the mantissa. This may be achieved by interpreting the bits as int.

Option 1 here only works in a certain range, so we check the range and adjust the constants used. This works in 64 bits because the desired float only has 23 bits of precision.

The result in this code will be double, but converting to float is trivial, and can be done on the bits or directly, depending on hardware.

After this you'd want to do the Newton-Raphson iteration(s).

Much of this code simply converts to magic numbers.

double                                                       
u64tod_inv( uint64_t u64 ) {                                 
  __asm__( "#annot0" );                                      
  union {                                                    
    double f;                                                
    struct {                                                 
      unsigned long m:52; // careful here with endianess     
      unsigned long x:11;                                    
      unsigned long s:1;                                     
    } u64;                                                   
    uint64_t u64i;                                           
  } z,                                                       
        magic0 = { .u64 = { 0, (1<<10)-1 + 52, 0 } },        
        magic1 = { .u64 = { 0, (1<<10)-1 + (52+12), 0 } },   
        magic2 = { .u64 = { 0, 2046, 0 } };                  

  __asm__( "#annot1" );                                      
  if( u64 < (1UL << 52UL ) ) {                               
    z.u64i = u64 + magic0.u64i;                              
    z.f   -= magic0.f;                                       
  } else {                                                   
    z.u64i = ( u64 >> 12 ) + magic1.u64i;                    
    z.f   -= magic1.f;                                       
  }                                                          
  __asm__( "#annot2" );                                      

  z.u64i = magic2.u64i - z.u64i;                             

  return z.f;                                                
}                                                            

Compiling this on an Intel core 7 gives a number of instructions (and a branch), but, of course, no multiplies or divides at all. If the casts between int and double are fast this should run pretty quickly.

I suspect float (with only 23 bits of precision) will require more than 1 or 2 Newton-Raphson iterations to get the accuracy you want, but I haven't done the math...

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!