avx2 | 易学教程

Fastest precise way to convert a vector of integers into floats between 0 and 1

阅读更多关于 Fastest precise way to convert a vector of integers into floats between 0 and 1

问题 Consider a randomly generated __m256i vector. Is there a faster precise way to convert them into __m256 vector of floats between 0 (inclusively) and 1 (exclusively) than division by float(1ull<<32) ? Here's what I have tried so far, where iRand is the input and ans is the output: const __m256 fRand = _mm256_cvtepi32_ps(iRand); const __m256 normalized = _mm256_div_ps(fRand, _mm256_set1_ps(float(1ull<<32))); const __m256 ans = _mm256_add_ps(normalized, _mm256_set1_ps(0.5f)); 回答1: The version

Deinterleve vector of nibbles using SIMD

阅读更多关于 Deinterleve vector of nibbles using SIMD

问题 I have an input vector of 16384 signed four bit integers. They are packed into 8192 Bytes. I need to interleave the values and unpack into signed 8 bit integers in two separate arrays. a,b,c,d are 4 bit values. A,B,C,D are 8 bit values. Input = [ab,cd,...] Out_1 = [A,C, ...] Out_2 = [B,D, ...] I can do this quite easily in C++. constexpr size_t size = 32768; int8_t input[size]; // raw packed 4bit integers int8_t out_1[size]; int8_t out_2[size]; for (int i = 0; i < size; i++) { out_1[i] =