intrinsics | 易学教程

How to convert 32-bit float to 8-bit signed char?

阅读更多关于 How to convert 32-bit float to 8-bit signed char?

问题 What I want to do is: Multiply the input floating point number by a fixed factor. Convert them to 8-bit signed char. Note that most of the inputs have a small absolute range of values, like [-6, 6], so that the fixed factor can map them to [-127, 127]. I work on avx2 instruction set only, so intrinsics function like _mm256_cvtepi32_epi8 can't be used. I would like to use _mm256_packs_epi16 but it mixes two inputs together. :( I also wrote some code that converts 32-bit float to 16-bit int,

How to swap two __m128i variables in C++03 given its an opaque type and an array?

阅读更多关于 How to swap two __m128i variables in C++03 given its an opaque type and an array?

问题 What is the best practice for swapping __m128i variables? The background is a compile error under Sun Studio 12.2, which is a C++03 compiler. __m128i is an opaque type used with MMX and SSE instructions, and its usually and unsigned long long[2] . C++03 does not provide the support for swapping arrays, and std:swap(__m128i a, __m128i b) fails under the compiler. Here are some related questions that don't quite hit the mark. They don't apply because std::vector is not available. How can we

What are intrinsics?

阅读更多关于 What are intrinsics?

问题 Can anyone explain what they are and why I would need them? What kind of applications am I building if I need to use intrinsics? 回答1: Normally, "intrinsics" refers to functions that are built-in -- i.e. most standard library functions that the compiler can/will generate inline instead of calling an actual function in the library. For example, a call like: memset(array1, 10, 0) could be compiled for an x86 as something like: mov ecx, 10 xor eax, eax mov edi, offset FLAT:array1 rep stosb

RDRAND and RDSEED intrinsics GCC and Intel C++

阅读更多关于 RDRAND and RDSEED intrinsics GCC and Intel C++

问题 Does Intel C++ compiler and/or GCC support the following intrinsics, like MSVC does since 2012 / 2013? int _rdrand16_step(uint16_t*); int _rdrand32_step(uint32_t*); int _rdrand64_step(uint64_t*); int _rdseed16_step(uint16_t*); int _rdseed32_step(uint32_t*); int _rdseed64_step(uint64_t*); And if these intrinsics are supported, since which version are they supported (with compile-time-constant please)? 回答1: Both GCC and Intel compiler support them. GCC support was introduced at the end of 2010.

RDRAND and RDSEED intrinsics GCC and Intel C++

阅读更多关于 RDRAND and RDSEED intrinsics GCC and Intel C++

What is __m128d?

阅读更多关于 What is __m128d?

问题 I really can't get what "keyword" like __m128d is in C++. I'm using MSVC, and it says: The __m128d data type, for use with the Streaming SIMD Extensions 2 instructions intrinsics, is defined in <emmintrin.h> . So, is it a Data Type? typedef ? If I do: #include <emmintrin.h> int main() { __m128d x; } I can't see the defination on <emmintrin.h> . It seems a keyword of compiler? Does it automatically convert that keyword to somethings like "move register xmm0" etc? Or which kind of operation

SSE intrinsics compiling MSDN code with GCC error?

阅读更多关于 SSE intrinsics compiling MSDN code with GCC error?

问题 I'm wondering if Microsofts SSE intrinsics are a little different than the norm because I tried compiling this code with GCC with flags -msse -msse2 -msse3 -msse4 #include <stdio.h> #include <smmintrin.h> int main () { __m128i a, b; a.m128i_u64[0] = 0x000000000000000; b.m128i_u64[0] = 0xFFFFFFFFFFFFFFF; a.m128i_u64[1] = 0x000000000000000; b.m128i_u64[1] = 0x000000000000000; int res1 = _mm_testnzc_si128(a, b); a.m128i_u64[0] = 0x000000000000001; int res2 = _mm_testnzc_si128(a, b); printf_s(

What is the method of storing contents of _m128i into an int array?

阅读更多关于 What is the method of storing contents of _m128i into an int array?

问题 We have the intrinsic _mm_storeu_ps to store __m128 into a float array. However, I don't see any equivalent for integers. I was expecting something like _mm_storeu_epi32 , but that doesn't exist. So, what is the way of storing a _m128i into an int array? 回答1: Its name is _mm_storeu_si128(). 来源： https://stackoverflow.com/questions/43018299/what-is-the-method-of-storing-contents-of-m128i-into-an-int-array

Error: Cannot use vector unsigned long long[2] to initialize vector unsigned long long[2]

阅读更多关于 Error: Cannot use vector unsigned long long[2] to initialize vector unsigned long long[2]

问题 We are testing under Sun Studio 12.3. We are catching a compiler error that's not present under 12.4 and later. Its not present under 12.1 and earlier, but that's because the compiler has trouble with AES instructions. Its also not present under other compilers, like Clang, GCC, ICPC and VC++. The error is: /opt/solarisstudio12.3/bin/CC -DDEBUG -g3 -xO0 -D__SSE2__ -D__SSE3__ -D__SSSE3__ \ -D__SSE4_1__ -D__SSE4_2__ -D__AES__ -D__PCLMUL__ -D__RDRND__ -D__RDSEED__ -D__AVX__ \ -D__AVX2__ -D__BMI_

Insight into the first argument mask in shflsync()

阅读更多关于 Insight into the first argument mask in __shfl__sync()

问题 Here is the test code for broadcasting variable: #include <stdio.h> #include <cuda_runtime.h> __global__ void broadcast(){ int lane_id = threadIdx.x & 0x1f; int value = 31 - lane_id; //let all lanes within the warp be broadcasted the value //whose laneID is 2 less than that of current lane int broadcasted_value = __shfl_up_sync(0xffffffff, value, 2) value = n; printf("thread %d final value = %d\n", threadIdx.x, value); } int main() { broadcast<<<1,32>>>(); cudaDeviceSynchronize(); return 0; }