gcc intrinsic for extended division/multiplication

前端未结

关注

 2  1176

北海茫月 2021-01-04 05:16

Modern CPU\'s can perform extended multiplication between two native-size words and store the low and high result in separate registers. Similarly, when performing division,

2条回答

予麋鹿 (楼主)

2021-01-04 06:22
For gcc since version 4.6 you can use __int128. This works on most 64 bit hardware. For instance

To get the 128 bit result of a 64x64 bit multiplication just use
```
void extmul(size_t a, size_t b, size_t *lo, size_t *hi) {
    __int128 result = (__int128)a * (__int128)b;
    *lo = (size_t)result;
    *hi = result >> 64;
}
```
On x86_64 gcc is smart enough to compile this to
```
   0:   48 89 f8                mov    %rdi,%rax
   3:   49 89 d0                mov    %rdx,%r8
   6:   48 f7 e6                mul    %rsi
   9:   49 89 00                mov    %rax,(%r8)
   c:   48 89 11                mov    %rdx,(%rcx)
   f:   c3                      retq   
```
No native 128 bit support or similar required, and after inlining only the mul instruction remains.

Edit: On a 32 bit arch this works in a similar way, you need to replace __int128_t by uint64_t and the shift width by 32. The optimization will work on even older gccs.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...