simd

Comparing 2 vectors in AVX/AVX2 (c)

和自甴很熟 提交于 2021-01-20 07:12:20
问题 I have two __m256i vectors (each containing chars), and I want to find out if they are completely identical or not. All I need is true if all bits are equal, and 0 otherwise. What's the most efficient way of doing that? Here's the code loading the arrays: char * a1 = "abcdefhgabcdefhgabcdefhgabcdefhg"; __m256i r1 = _mm256_load_si256((__m256i *) a1); char * a2 = "abcdefhgabcdefhgabcdefhgabcdefhg"; __m256i r2 = _mm256_load_si256((__m256i *) a2); 回答1: The most efficient way on current Intel and

Comparing 2 vectors in AVX/AVX2 (c)

筅森魡賤 提交于 2021-01-20 07:11:50
问题 I have two __m256i vectors (each containing chars), and I want to find out if they are completely identical or not. All I need is true if all bits are equal, and 0 otherwise. What's the most efficient way of doing that? Here's the code loading the arrays: char * a1 = "abcdefhgabcdefhgabcdefhgabcdefhg"; __m256i r1 = _mm256_load_si256((__m256i *) a1); char * a2 = "abcdefhgabcdefhgabcdefhgabcdefhg"; __m256i r2 = _mm256_load_si256((__m256i *) a2); 回答1: The most efficient way on current Intel and

Writing a portable SSE/AVX version of std::copysign

蹲街弑〆低调 提交于 2021-01-18 12:07:07
问题 I am currently writing a vectorized version of the QR decomposition (linear system solver) using SSE and AVX intrinsics. One of the substeps requires to select the sign of a value opposite/equal to another value. In the serial version, I used std::copysign for this. Now I want to create a similar function for SSE/AVX registers. Unfortunately, the STL uses a built-in function for that, so I can't just copy the code and turn it into SSE/AVX instructions. I have not tried it yet (so I have no

Fastest precise way to convert a vector of integers into floats between 0 and 1

跟風遠走 提交于 2021-01-02 05:45:38
问题 Consider a randomly generated __m256i vector. Is there a faster precise way to convert them into __m256 vector of floats between 0 (inclusively) and 1 (exclusively) than division by float(1ull<<32) ? Here's what I have tried so far, where iRand is the input and ans is the output: const __m256 fRand = _mm256_cvtepi32_ps(iRand); const __m256 normalized = _mm256_div_ps(fRand, _mm256_set1_ps(float(1ull<<32))); const __m256 ans = _mm256_add_ps(normalized, _mm256_set1_ps(0.5f)); 回答1: The version

Deinterleve vector of nibbles using SIMD

陌路散爱 提交于 2020-12-31 10:54:54
问题 I have an input vector of 16384 signed four bit integers. They are packed into 8192 Bytes. I need to interleave the values and unpack into signed 8 bit integers in two separate arrays. a,b,c,d are 4 bit values. A,B,C,D are 8 bit values. Input = [ab,cd,...] Out_1 = [A,C, ...] Out_2 = [B,D, ...] I can do this quite easily in C++. constexpr size_t size = 32768; int8_t input[size]; // raw packed 4bit integers int8_t out_1[size]; int8_t out_2[size]; for (int i = 0; i < size; i++) { out_1[i] =

Deinterleve vector of nibbles using SIMD

吃可爱长大的小学妹 提交于 2020-12-31 10:54:03
问题 I have an input vector of 16384 signed four bit integers. They are packed into 8192 Bytes. I need to interleave the values and unpack into signed 8 bit integers in two separate arrays. a,b,c,d are 4 bit values. A,B,C,D are 8 bit values. Input = [ab,cd,...] Out_1 = [A,C, ...] Out_2 = [B,D, ...] I can do this quite easily in C++. constexpr size_t size = 32768; int8_t input[size]; // raw packed 4bit integers int8_t out_1[size]; int8_t out_2[size]; for (int i = 0; i < size; i++) { out_1[i] =

Deinterleve vector of nibbles using SIMD

一世执手 提交于 2020-12-31 10:53:08
问题 I have an input vector of 16384 signed four bit integers. They are packed into 8192 Bytes. I need to interleave the values and unpack into signed 8 bit integers in two separate arrays. a,b,c,d are 4 bit values. A,B,C,D are 8 bit values. Input = [ab,cd,...] Out_1 = [A,C, ...] Out_2 = [B,D, ...] I can do this quite easily in C++. constexpr size_t size = 32768; int8_t input[size]; // raw packed 4bit integers int8_t out_1[size]; int8_t out_2[size]; for (int i = 0; i < size; i++) { out_1[i] =

Deinterleve vector of nibbles using SIMD

风格不统一 提交于 2020-12-31 10:51:27
问题 I have an input vector of 16384 signed four bit integers. They are packed into 8192 Bytes. I need to interleave the values and unpack into signed 8 bit integers in two separate arrays. a,b,c,d are 4 bit values. A,B,C,D are 8 bit values. Input = [ab,cd,...] Out_1 = [A,C, ...] Out_2 = [B,D, ...] I can do this quite easily in C++. constexpr size_t size = 32768; int8_t input[size]; // raw packed 4bit integers int8_t out_1[size]; int8_t out_2[size]; for (int i = 0; i < size; i++) { out_1[i] =

Deinterleve vector of nibbles using SIMD

拥有回忆 提交于 2020-12-31 10:51:18
问题 I have an input vector of 16384 signed four bit integers. They are packed into 8192 Bytes. I need to interleave the values and unpack into signed 8 bit integers in two separate arrays. a,b,c,d are 4 bit values. A,B,C,D are 8 bit values. Input = [ab,cd,...] Out_1 = [A,C, ...] Out_2 = [B,D, ...] I can do this quite easily in C++. constexpr size_t size = 32768; int8_t input[size]; // raw packed 4bit integers int8_t out_1[size]; int8_t out_2[size]; for (int i = 0; i < size; i++) { out_1[i] =

AVX segmentation fault on linux [closed]

烈酒焚心 提交于 2020-12-25 04:18:10
问题 Closed. This question needs debugging details. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 5 years ago . Improve this question I am trying to run this code and it says segmentation fault when I run it. It compiles good. Here is the code. (It works fine on windows). #include<iostream> #include<vector> #include<immintrin.h> const int size = 1000000; std::vector<float>A(size); std::vector<float>B(size); std