GCC couldn't vectorize 64-bit multiplication. Can 64-bit x 64-bit -> 128-bit widening multiplication be vectorized on AVX2?

I try to vectorize a CBRNG which uses 64bit widening multiplication.

static __inline__ uint64_t mulhilo64(uint64_t a, uint64_t b, uint64_t* hip) {
    __uint128_t product = ((__uint128_t)a)*((__uint128_t)b);
    *hip = product>>64;
    return (uint64_t)product;
}

Is such a multiplication exists in a vectorized form in AVX2?

No. There's no 64 x 64 -> 128 bit arithmetic as a vector instruction. Nor is there a vector mulhi type instruction (high word result of multiply).

[V]PMULUDQ can do 32 x 32 -> 64 bit by only considering every second 32 bit unsigned element, or unsigned doubleword, as a source, and expanding each 64 bit result into two result elements combined as an unsigned quadword.

The best you can probably hope for right now is Haswell's MULX instruction, which has more flexible register use, and does not affect the flags register - eliminating some stalls.

来源：https://stackoverflow.com/questions/24569654/gcc-couldnt-vectorize-64-bit-multiplication-can-64-bit-x-64-bit-128-bit-wid

标签

c++

computer-science

vectorization

simd

avx2

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!