Loading data for GCC's vector extensions

て烟熏妆下的殇ゞ 提交于 2019-11-30 13:44:08

You could use an initializer to load the values, i.e. do

const vec16qi e = { buf[0], buf[1], ... , buf[15] }

and hope that GCC turns this into a SSE load instruction. I'd verify that with a dissassembler, though ;-). Also, for better performance, you try to make buf 16-byte aligned, and inform that compiler via an aligned attribute. If you can guarantee that the input buffer will be aligned, process it bytewise until you've reached a 16-byte boundard.

ZachB

Edit (thanks Peter Cordes) You can cast pointers:

typedef char v16qi __attribute__ ((vector_size (16), aligned (16)));

v16qi vec = *(v16qi*)&buf[i]; // load
*(v16qi*)(buf + i) = vec; // store whole vector

This compiles to vmovdqa to load and vmovups to store. If the data isn't known to be aligned, set aligned (1) to generate vmovdqu. (godbolt)

Note that there are also several special-purpose builtins for loading and unloading these registers (Edit 2):

v16qi vec = _mm_loadu_si128((__m128i*)&buf[i]); // _mm_load_si128 for aligned
_mm_storeu_si128((__m128i*)&buf[i]), vec); // _mm_store_si128 for aligned

It seems to be necessary to use -flax-vector-conversions to go from chars to v16qi with this function.

See also: C - How to access elements of vector using GCC SSE vector extension
See also: SSE loading ints into __m128

(Tip: The best phrase to google is something like "gcc loading __m128i".)

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!