Compute (a*b)%n FAST for 64-bit unsigned arguments in C(++) on x86-64 platforms?

前端 未结 5 684
走了就别回头了
走了就别回头了 2021-01-15 12:35

I\'m looking for a fast method to efficiently compute  (ab) modulo n  (in the mathematical sense of that) for

5条回答
  •  庸人自扰
    2021-01-15 12:58

    You could do it the old-fashioned way with shift/add/subtract. The below code assumes a < n and
    n < 263 (so things don't overflow):

    uint64_t mulmod(uint64_t a, uint64_t b, uint64_t n) {
        uint64_t rv = 0;
        while (b) {
            if (b&1)
                if ((rv += a) >= n) rv -= n;
            if ((a += a) >= n) a -= n;
            b >>= 1; }
        return rv;
    }
    

    You could use while (a && b) for the loop instead to short-circuit things if it's likely that a will be a factor of n. Will be slightly slower (more comparisons and likely correctly predicted branches) if a is not a factor of n.

    If you really, absolutely, need that last bit (allowing n up to 264-1), you can use:

    uint64_t mulmod(uint64_t a, uint64_t b, uint64_t n) {
        uint64_t rv = 0;
        while (b) {
            if (b&1) {
                rv += a;
                if (rv < a || rv >= n) rv -= n; }
            uint64_t t = a;
            a += a;
            if (a < t || a >= n) a -= n;
            b >>= 1; }
        return rv;
    }
    

    Alternately, just use GCC instrinsics to access the underlying x64 instructions:

    inline uint64_t mulmod(uint64_t a, uint64_t b, uint64_t n) {
        uint64_t rv;
        asm ("mul %3" : "=d"(rv), "=a"(a) : "1"(a), "r"(b));
        asm ("div %4" : "=d"(rv), "=a"(a) : "0"(rv), "1"(a), "r"(n));
        return rv;
    }
    

    The 64-bit div instruction is really slow, however, so the loop might actually be faster. You'd need to profile to be sure.

提交回复
热议问题