How can I multiply and divide 64-bit ints accurately?

孤街浪徒 提交于 2019-12-01 23:20:43

Write a = q*d + r with |r| < |d| (I'm assuming d != 0, otherwise the computation is meaningless anyway). Then (a*b*c)/d = q*b*c + (r*b*c)/d. If q*b*c overflows, the entire computation would overflow anyway, so either you don't care, or you have to check for overflow. r*b*c might still overflow, so we again use the same method to avoid overflow,

int64_t q = a/d, r = a%d;
int64_t part1 = q*b*c;
int64_t q1 = (r*b)/d, r1 = (r*b)%d;
return part1 + q1*c + (r1*c)/d;

I would suggest finding the greatest common divisor of d with each of a, b, and c, dividing out the common factors as you go:

common = gcd(a,d) // You can implement GCD using Euclid's algorithm

a=a/common
d=d/common

common = gcd(b,d)
b=b/common
d=d/common

common = gcd(c,d)
c=c/common
d=d/common

Then calculate a*b*c/d with all the common factors removed. Euclid's GCD algorithm runs in logarithmic time, so this should be fairly efficient.

It's easy to see that some inputs would produce outputs that cannot be represented by the return type int64_t. For example, fn(INT64_MAX, 2, 1, 1). However, the following approach will should allow you to return a correct answer for any combination of inputs that does in fact fit within the range of int64_t.

int64_t fn(int64_t a, int32_t b, int32_t c, int32_t d)
{
    /* find the integer and remainder portions of a/d */
    int64_t leftI = a / d;
    int64_t leftR = a % d;

    /* multiply the integer portion of the result by b and c */
    int64_t resultI = leftI * b * c;

    /* multiply the remainder portion by b */
    int64_t resultR = leftR * b;
    resultI = resultI + (resultR / d) * c;

    /* multiply the remainder portion by c */
    resultR = (resultR % d) * c;

    return resultI + (resultR / d);
}

If you are working on x86_64 then the asm supports 128 bit integers:

int64_t fn(uint64_t a, uint64_t b, uint64_t c, uint64_t d) {

    asm (
        "mulq %1\n"          // a *= b
        "movq %%rbx, %%rdx\n"// rbx = upper 64 bit of the multiplication
        "mulq %2\n"          // multiply the lower 64 bits by c
        "push %%rax\n"       // temporarily save the lowest 64 bits on the stack
        "mov %%rcx, %%rdx\n" // rcx = upper 64 bits of the multiplication
        "movq %%rax, %%rbx\n"// 
        "mulq %2\n"          // multiply the upper 64 bits by c
        "addq %%rax, %%rcx\n"// combine the middle 64 bits
        "addcq %%rdx, $0\n"  // transfer carry tp the higest 64 bits if present
        "divq %3\n"          // divide the upper 128 (of 192) bits by d
        "mov %%rbx, %%rax\n" // rbx = result
        "pop %%rax\n"
        "divq %3\n"          // divide remainder:lower 64 bits by d
        : "+a" (a)           // assigns a to rax register as in/out
        , "+b" (b)           // assigns b to rbx register
        : "g" (c)            // assigns c to random register
        , "g" (d)            // assigns d to random register
        : "edx", "rdx"       // tells the compiler that edx/rdx will be used internally, but does not need any input
    );

    // b now holds the upper 64 bit if (a * b * c / d) > UINT64_MAX
    return a;
}

Please note that all input integers have to be the same length. Working length will be double the input. Works with unsigned only.

The native div and mul instructions on x86 work on double-length exactly to allow for overflows. Sadly I am unaware of a compiler intrinsic to make use of them.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!