simd | 易学教程

Comparing 2 vectors in AVX/AVX2 (c)

阅读更多关于 Comparing 2 vectors in AVX/AVX2 (c)

问题 I have two __m256i vectors (each containing chars), and I want to find out if they are completely identical or not. All I need is true if all bits are equal, and 0 otherwise. What's the most efficient way of doing that? Here's the code loading the arrays: char * a1 = "abcdefhgabcdefhgabcdefhgabcdefhg"; __m256i r1 = _mm256_load_si256((__m256i *) a1); char * a2 = "abcdefhgabcdefhgabcdefhgabcdefhg"; __m256i r2 = _mm256_load_si256((__m256i *) a2); 回答1: The most efficient way on current Intel and

Comparing 2 vectors in AVX/AVX2 (c)

阅读更多关于 Comparing 2 vectors in AVX/AVX2 (c)

Writing a portable SSE/AVX version of std::copysign

阅读更多关于 Writing a portable SSE/AVX version of std::copysign

问题 I am currently writing a vectorized version of the QR decomposition (linear system solver) using SSE and AVX intrinsics. One of the substeps requires to select the sign of a value opposite/equal to another value. In the serial version, I used std::copysign for this. Now I want to create a similar function for SSE/AVX registers. Unfortunately, the STL uses a built-in function for that, so I can't just copy the code and turn it into SSE/AVX instructions. I have not tried it yet (so I have no

Fastest precise way to convert a vector of integers into floats between 0 and 1

阅读更多关于 Fastest precise way to convert a vector of integers into floats between 0 and 1

问题 Consider a randomly generated __m256i vector. Is there a faster precise way to convert them into __m256 vector of floats between 0 (inclusively) and 1 (exclusively) than division by float(1ull<<32) ? Here's what I have tried so far, where iRand is the input and ans is the output: const __m256 fRand = _mm256_cvtepi32_ps(iRand); const __m256 normalized = _mm256_div_ps(fRand, _mm256_set1_ps(float(1ull<<32))); const __m256 ans = _mm256_add_ps(normalized, _mm256_set1_ps(0.5f)); 回答1: The version

Deinterleve vector of nibbles using SIMD

阅读更多关于 Deinterleve vector of nibbles using SIMD

问题 I have an input vector of 16384 signed four bit integers. They are packed into 8192 Bytes. I need to interleave the values and unpack into signed 8 bit integers in two separate arrays. a,b,c,d are 4 bit values. A,B,C,D are 8 bit values. Input = [ab,cd,...] Out_1 = [A,C, ...] Out_2 = [B,D, ...] I can do this quite easily in C++. constexpr size_t size = 32768; int8_t input[size]; // raw packed 4bit integers int8_t out_1[size]; int8_t out_2[size]; for (int i = 0; i < size; i++) { out_1[i] =

Deinterleve vector of nibbles using SIMD

阅读更多关于 Deinterleve vector of nibbles using SIMD

Deinterleve vector of nibbles using SIMD

阅读更多关于 Deinterleve vector of nibbles using SIMD

Deinterleve vector of nibbles using SIMD

阅读更多关于 Deinterleve vector of nibbles using SIMD

Deinterleve vector of nibbles using SIMD

阅读更多关于 Deinterleve vector of nibbles using SIMD

AVX segmentation fault on linux [closed]

阅读更多关于 AVX segmentation fault on linux [closed]

问题 Closed. This question needs debugging details. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 5 years ago . Improve this question I am trying to run this code and it says segmentation fault when I run it. It compiles good. Here is the code. (It works fine on windows). #include<iostream> #include<vector> #include<immintrin.h> const int size = 1000000; std::vector<float>A(size); std::vector<float>B(size); std