I\'m trying to figure out an efficient way to load compile time constant floats into SSE(2/3) registers. I\'ve tried doing simple code like this,
const __m128 x
If you want to force it to a single load, you could try (gcc):
__attribute__((aligned(16))) float vec[4] = { 1.0f, 1.1f, 1.2f, 1.3f };
__m128 v = _mm_load_ps(vec); // edit by sor: removed the "&" cause its already an address
If you have Visual C++, use __declspec(align(16))
to request the proper constraint.
On my system, this (compiled with gcc -m32 -msse -O2
; no optimization at all clutters the code but still retains the single movaps
in the end) creates the following assembly code (gcc / AT&T syntax):
andl $-16, %esp
subl $16, %esp
movl $0x3f800000, (%esp)
movl $0x3f8ccccd, 4(%esp)
movl $0x3f99999a, 8(%esp)
movl $0x3fa66666, 12(%esp)
movaps (%esp), %xmm0
Note that it aligns the stackpointer before allocating stackspace and putting the constants in there. Leaving the __attribute__((aligned))
out may, depending on your compiler, create incorrect code that doesn't do this, so beware, and check the disassembly.
Additionally:
Since you've been asking for how to put constants into the code, simply try the above with a static
qualifier for the float
array. That creates the following assembly:
movaps vec.7330, %xmm0
...
vec.7330:
.long 1065353216
.long 1066192077
.long 1067030938
.long 1067869798