Multiword addition in C

一笑奈何 提交于 2019-12-18 07:00:32

问题


I have a C program which uses GCC's __uint128_t which is great, but now my needs have grown beyond it.

What are my options for fast arithmetic with 196 or 256 bits?

The only operation I need is addition (and I don't need the carry bit, i.e., I will be working mod 2192 or 2256).

Speed is important, so I don't want to move to a general multi-precision if at all possible. (In fact my code does use multi-precision in some places, but this is in the critical loop and will run tens of billions of times. So far the multi-precision needs to run only tens of thousands of times.)

Maybe this is simple enough to code directly, or maybe I need to find some appropriate library.

What is your advice, Oh great Stack Overflow?

Clarification: GMP is too slow for my needs. Although I actually use multi-precision in my code it's not in the inner loop and runs less than 105 times. The hot loop runs more like 1012 times. When I changed my code (increasing a size parameter) so that the multi-precision part ran more often vs. the single-precision, I had a 100-fold slowdown (mostly due to memory management issues, I think, rather than extra µops). I'd like to get that down to a 4-fold slowdown or better.


回答1:


256-bit version

__uint128_t a[2], b[2], c[2];  // c = a + b
c[0] = a[0] + b[0];
c[1] = a[1] + b[1] + (c[0] < a[0]);

If you use it many times in a loop you should consider make it parallel by SIMD and multithreading

Edit: 192-bit version. This way you can eliminate the 128-bit comparison like what @harold's stated:

struct __uint192_t {
    __uint128_t H;
    __uint64_t L;
} a, b, c;  // c = a + b
c.L = a.L + b.L;
c.H = a.H + b.H + (c.L < a.L);



回答2:


You could test if the "add (low < oldlow) to simulate carry"-technique from this answer is fast enough. It's slightly complicated by the fact that low is an __uint128_t here, that could hurt code generation. You might try it with 4 uint64_t's as well, I don't know whether that'll be better or worse.

If that's not good enough, drop to inline assembly, and directly use the carry flag - it doesn't get any better than that, but you'd have the usual downsides of using inline assembly.



来源:https://stackoverflow.com/questions/22126073/multiword-addition-in-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!