I have a C program which uses GCC's __uint128_t which is great, but now my needs have grown beyond it.
What are my options for fast arithmetic with 196 or 256 bits?
The only operation I need is addition (and I don't need the carry bit, i.e., I will be working mod 2^192 or 2^256).
Speed is important, so I don't want to move to a general multiprecision if at all possible. (In fact my code does use multiprecision in some places, but this is in the critical loop and will run tens of billions of times. So far the multiprecision needs to run only tens of thousands of times.)
Maybe this is simple enough to code directly, or maybe I need to find some appropriate library.
What is your advice, Oh great Stack Overflow?
Clarification: GMP is too slow for my needs. Although I actually use multiprecision in my code it's not in the inner loop and runs less than 10^5 times. The hot loop runs more like 10^12 times. When I changed my code (increasing a size parameter) so that the multiprecision part ran more often vs. the single-precision, I had a 100-fold slowdown (mostly due to memory management issues, I think, rather than extra µops). I'd like to get that down to a 4-fold slowdown or better.
256-bit version
__uint128_t a[2], b[2], c[2]; // c = a + b
c[0] = a[0] + b[0];
c[1] = a[1] + b[1] + (c[0] < a[0]);
If you use it many times in a loop you should consider make it parallel by SIMD and multithreading
Edit: 192-bit version. This way you can eliminate the 128-bit comparison like what @harold's stated:
struct __uint192_t {
__uint128_t H;
__uint64_t L;
} a, b, c; // c = a + b
c.L = a.L + b.L;
c.H = a.H + b.H + (c.L < a.L);
You could test if the "add (low < oldlow) to simulate carry"-technique from this answer is fast enough. It's slightly complicated by the fact that low is an __uint128_t here, that could hurt code generation. You might try it with 4 uint64_t's as well, I don't know whether that'll be better or worse.
If that's not good enough, drop to inline assembly, and directly use the carry flag - it doesn't get any better than that, but you'd have the usual downsides of using inline assembly.
来源:https://stackoverflow.com/questions/22126073/multiword-addition-in-c