gcc intrinsic for extended division/multiplication

前端 未结 2 1176
北海茫月
北海茫月 2021-01-04 05:16

Modern CPU\'s can perform extended multiplication between two native-size words and store the low and high result in separate registers. Similarly, when performing division,

2条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2021-01-04 06:22

    For gcc since version 4.6 you can use __int128. This works on most 64 bit hardware. For instance

    To get the 128 bit result of a 64x64 bit multiplication just use

    void extmul(size_t a, size_t b, size_t *lo, size_t *hi) {
        __int128 result = (__int128)a * (__int128)b;
        *lo = (size_t)result;
        *hi = result >> 64;
    }
    

    On x86_64 gcc is smart enough to compile this to

       0:   48 89 f8                mov    %rdi,%rax
       3:   49 89 d0                mov    %rdx,%r8
       6:   48 f7 e6                mul    %rsi
       9:   49 89 00                mov    %rax,(%r8)
       c:   48 89 11                mov    %rdx,(%rcx)
       f:   c3                      retq   
    

    No native 128 bit support or similar required, and after inlining only the mul instruction remains.

    Edit: On a 32 bit arch this works in a similar way, you need to replace __int128_t by uint64_t and the shift width by 32. The optimization will work on even older gccs.

提交回复
热议问题