问题
I am trying to do a bitwise &
between elements of two arrays of uint64_t
integers and then store the result in another array. This is my program:
#include <emmintrin.h>
#include <nmmintrin.h>
#include <chrono>
int main()
{
uint64_t data[200];
uint64_t data2[200];
uint64_t data3[200];
__m128i* ptr = (__m128i*) data;
__m128i* ptr2 = (__m128i*) data2;
uint64_t* ptr3 = data3;
for (int i = 0; i < 100; ++i, ++ptr, ++ptr2, ptr3 += 2)
_mm_store_ps(ptr3, _mm_and_si128(*ptr, *ptr2));
}
However, I get this error:
test.cpp:17:50: error: cannot convert ‘uint64_t* {aka long unsigned int*}’ to ‘float*’ for argument ‘1’ to ‘void _mm_store_ps(float*, __m128)’
_mm_store_ps(ptr3, _mm_and_si128(*ptr, *ptr2));
For some reason, the compiler thinks I'm copying to an array of floats. Is it possible to do what I am trying to do with arrays of uint64_t
?
回答1:
You can use _mm_store_si128.
First change pointer ptr3
to
__m128i* ptr3 = (__m128i*) data3;
and then
for (int i = 0; i < 100; ++i, ++ptr, ++ptr2, ++ptr3)
_mm_store_si128(ptr3, _mm_and_si128(*ptr, *ptr2));
回答2:
You are using the floating point operation _mm_store_ps although you actually want to store integers. So either use _mm_store_si128 or cast the result back to uint64_t
.
You should also make sure to align the arrays to 16 byte, so aligned load/store operations can be used which will be faster.
#include <emmintrin.h>
#include <nmmintrin.h>
#include <chrono>
int main()
{
__declspec(align(16)) uint64_t data[200];
__declspec(align(16)) uint64_t data2[200];
__declspec(align(16)) uint64_t data3[200];
__m128i* ptr = (__m128i*) data;
__m128i* ptr2 = (__m128i*) data2;
__m128i* ptr3 = (__m128i*) data3;
for (int i = 0; i < 100; ++i, ++ptr, ++ptr2, ++ptr3)
*ptr3 = _mm_and_si128(*ptr, *ptr2);
}
来源:https://stackoverflow.com/questions/42463284/c-simd-store-uint64-t-value-after-bitwise-and-operation