How are the gather instructions in AVX2 implemented?

前端 未结 2 1573
故里飘歌
故里飘歌 2020-12-08 10:36

Suppose I\'m using AVX2\'s VGATHERDPS - this should load 8 single-precision floats using 8 DWORD indices.

What happens when the data to be loaded exists in different

2条回答
  •  攒了一身酷
    2020-12-08 11:00

    I did some benchmarking of the AVX gather instructions (on a Haswell CPU) and it seems to be a fairly simple brute force implementation - even when the elements to be loaded are contiguous it seems that there is still one read cycle per element, so performance is really no better than just doing scalar loads.

    NB: this answer is now obsolete as things have changed considerably since Haswell. See the accepted answer for full details (unless you happen to be targeting Haswell CPUs).

提交回复
热议问题