Fast method to multiply integer by proper fraction without floats or overflow

后端 未结 3 1459
悲哀的现实
悲哀的现实 2021-02-19 12:56

My program frequently requires the following calculation to be performed:

Given:

  • N is a 32-bit integer
  • D is a 32-bit integer
  • abs(N) <=
3条回答
  •  执念已碎
    2021-02-19 13:32

    I've now benchmarked several possible solutions, including weird/clever ones from other sources like combining 32-bit div & mod & add or using peasant math, and here are my conclusions:

    First, if you are only targeting Windows and using VSC++, just use MulDiv(). It is quite fast (faster than directly using 64-bit variables in my tests) while still being just as accurate and rounding the result for you. I could not find any superior method to do this kind of thing on Windows with VSC++, even taking into account restrictions like unsigned-only and N <= D.

    However, in my case having a function with deterministic results even across platforms is even more important than speed. On another platform I was using as a test, the 64-bit divide is much, much slower than the 32-bit one when using the 32-bit libraries, and there is no MulDiv() to use. The 64-bit divide on this platform takes ~26x as long as a 32-bit divide (yet the 64-bit multiply is just as fast as the 32-bit version...).

    So if you have a case like me, I will share the best results I got, which turned out to be just optimizations of chux's answer.

    Both of the methods I will share below make use of the following function (though the compiler-specific intrinsics only actually helped in speed with MSVC in Windows):

    inline u32 bitsRequired(u32 val)
    {
        #ifdef _MSC_VER
            DWORD r = 0;
            _BitScanReverse(&r, val | 1);
            return r+1;
        #elif defined(__GNUC__) || defined(__clang__)
            return 32 - __builtin_clz(val | 1);
        #else
            int r = 1;
            while (val >>= 1) ++r;
            return r;
        #endif
    }
    

    Now, if x is a constant that's 16-bit in size or smaller and you can pre-compute the bits required, I found the best results in speed and accuracy from this function:

    u32 multConstByPropFrac(u32 x, u32 nMaxBits, u32 n, u32 d)
    {
        //assert(nMaxBits == 32 - bitsRequired(x));
        //assert(n <= d);
        const int bitShift = bitsRequired(n) - nMaxBits;
        if( bitShift > 0 )
        {
            n >>= bitShift;
            d >>= bitShift;
        }
    
        // Remove the + d/2 part if don't need rounding
        return (x * n + d/2) / d;
    }
    

    On the platform with the slow 64-bit divide, the above function ran ~16.75x as fast as return ((u64)x * n + d/2) / d; and with an average 99.999981% accuracy (comparing difference in return value from expected to range of x, i.e. returning +/-1 from expected when x is 2048 would be 100 - (1/2048 * 100) = 99.95% accurate) when testing it with a million or so randomized inputs where roughly half of them would normally have been an overflow. Worst-case accuracy was 99.951172%.

    For the general use case, I found the best results from the following (and without needing to restrict N <= D to boot!):

    u32 scaleToFraction(u32 x, u32 n, u32 d)
    {
        u32 bits = bitsRequired(x);
        int bitShift = bits - 16;
        if( bitShift < 0 ) bitShift = 0;
        int sh = bitShift;
        x >>= bitShift;
    
        bits = bitsRequired(n);
        bitShift = bits - 16;
        if( bitShift < 0 ) bitShift = 0;
        sh += bitShift;
        n >>= bitShift;
    
        bits = bitsRequired(d);
        bitShift = bits - 16;
        if( bitShift < 0 ) bitShift = 0;
        sh -= bitShift;
        d >>= bitShift;
    
        // Remove the + d/2 part if don't need rounding
        u32 r = (x * n + d/2) / d;
        if( sh < 0 )
            r >>= (-sh);
        else //if( sh > 0 )
            r <<= sh;
    
        return r;
    }
    

    On the platform with the slow 64-bit divide, the above function ran ~18.5x as fast as using 64-bit variables and with 99.999426% average and 99.947479% worst-case accuracy.

    I was able to get more speed or more accuracy by messing with the shifting, such as trying to not shift all the way down to 16-bit if it wasn't strictly necessary, but any increase in speed came at a high cost in accuracy and vice versa.

    None of the other methods I tested came even close to the same speed or accuracy, most being slower than just using the 64-bit method or having huge loss in precision, so not worth going into.

    Obviously, no guarantee that anyone else will get similar results on other platforms!

    EDIT: Replaced some bit-twiddling hacks with plain code that actually ran faster anyway by letting the compiler do its job.

提交回复
热议问题