sse | 易学教程

On x86-64, is the “movnti” or “movntdq” instruction atomic when system crash?

阅读更多关于 On x86-64, is the “movnti” or “movntdq” instruction atomic when system crash?

问题 When using persistent memory like Intel optane DCPMM, is it possible to see partial result after reboot if system crash(power outage) in execution of movnt instruction? For: 4 or 8 byte movnti which x86 guarantees atomic for other purposes? 16-byte SSE movntdq / movntps which aren't guaranteed atomic but which in practice probably are on CPUs supporting persistent memory. 32-byte AVX vmovntdq / vmovntps 64-byte AVX512 vmovntdq / vmovntps full-line stores bonus question: MOVDIR64B which has

Find min/max value from a __m128i

阅读更多关于 Find min/max value from a __m128i

问题 I want to find the minimum/maximum value into an array of byte using SIMD operations. So far I was able to go through the array and store the minimum/maximum value into a __m128i variable, but it means that the value I am looking for is mixed among others (15 others to be exact). I've found these discussions here and here for integer, and this page for float, but I don't understand how works _mm_shuffle*. So my questions are: What SIMD operations do I have to perform in order to extract the

Find min/max value from a __m128i

阅读更多关于 Find min/max value from a __m128i

Writing a portable SSE/AVX version of std::copysign

阅读更多关于 Writing a portable SSE/AVX version of std::copysign

问题 I am currently writing a vectorized version of the QR decomposition (linear system solver) using SSE and AVX intrinsics. One of the substeps requires to select the sign of a value opposite/equal to another value. In the serial version, I used std::copysign for this. Now I want to create a similar function for SSE/AVX registers. Unfortunately, the STL uses a built-in function for that, so I can't just copy the code and turn it into SSE/AVX instructions. I have not tried it yet (so I have no

SSE Error - Using m128i_i32 to define fields of a __m128i variable

阅读更多关于 SSE Error - Using m128i_i32 to define fields of a __m128i variable

问题 Upon defining a __m128i variable in this manner: __m128i a; a.m128i_i32[0] = 65000; I get the following error: error: request for member ‘m128i_i32’ in ‘a’, which is of non-class type ‘__m128i {aka __vector(2) long long int}’ a.m128i_i32[0] = 65000; I have included the followinf header files: #include <x86intrin.h> #include <emmintrin.h> #include <smmintrin.h> 回答1: Your code will work under Visual where __m128 is defined as typedef union __declspec(intrin_type) __declspec(align(16)) __m128i {

Deinterleve vector of nibbles using SIMD

阅读更多关于 Deinterleve vector of nibbles using SIMD

问题 I have an input vector of 16384 signed four bit integers. They are packed into 8192 Bytes. I need to interleave the values and unpack into signed 8 bit integers in two separate arrays. a,b,c,d are 4 bit values. A,B,C,D are 8 bit values. Input = [ab,cd,...] Out_1 = [A,C, ...] Out_2 = [B,D, ...] I can do this quite easily in C++. constexpr size_t size = 32768; int8_t input[size]; // raw packed 4bit integers int8_t out_1[size]; int8_t out_2[size]; for (int i = 0; i < size; i++) { out_1[i] =

Deinterleve vector of nibbles using SIMD

阅读更多关于 Deinterleve vector of nibbles using SIMD

Deinterleve vector of nibbles using SIMD

阅读更多关于 Deinterleve vector of nibbles using SIMD

Deinterleve vector of nibbles using SIMD

阅读更多关于 Deinterleve vector of nibbles using SIMD

Deinterleve vector of nibbles using SIMD

阅读更多关于 Deinterleve vector of nibbles using SIMD