sse2

How to load two packed 64-bit quadwords into a 128-bit xmm register

倾然丶 夕夏残阳落幕 提交于 2020-07-09 05:43:07
问题 I have two UInt64 (i.e. 64-bit quadword) integers. they are aligned to an 8-byte ( sizeof(UInt64) ) boundary (i could also align them to 16-byte if that's useful for anything) they are packed together so they are side-by-side in memory How do i load them into an xmm register, e.g. xmm0 : I've found: movq xmm0, v[0] but that only moves v[0] , and sets the upper 64-bits in xmm0 to zeros: xmm0 0000000000000000 24FC18D93B2C9D8F Bonus Questions How do i get them back out? What if they're not side

Left-shift (of float32 array) with AVX2 and filling up with a zero

我是研究僧i 提交于 2020-06-28 03:59:52
问题 I have been using the following "trick" in C code with SSE2 for single precision floats for a while now: static inline __m128 SSEI_m128shift(__m128 data) { return (__m128)_mm_srli_si128(_mm_castps_si128(data), 4); } For data like [1.0, 2.0, 3.0, 4.0] , it results in [2.0, 3.0, 4.0, 0.0] , i.e. it does a left shift by one position and fills the data structure with a zero. If I remember correctly, the above inline function compiles down to a single instruction (with gcc at least). I am somehow

Shift a __m128i of n bits

巧了我就是萌 提交于 2020-06-24 22:10:50
问题 I have a __m128i variable and I need to shift its 128 bit value of n bits, i.e. like _mm_srli_si128 and _mm_slli_si128 work, but on bits instead of bytes. What is the most efficient way of doing this? 回答1: This is the best that I could come up with for left/right immediate shifts with SSE2: #include <stdio.h> #include <emmintrin.h> #define SHL128(v, n) \ ({ \ __m128i v1, v2; \ \ if ((n) >= 64) \ { \ v1 = _mm_slli_si128(v, 8); \ v1 = _mm_slli_epi64(v1, (n) - 64); \ } \ else \ { \ v1 = _mm_slli

Shift a __m128i of n bits

Deadly 提交于 2020-06-24 22:05:18
问题 I have a __m128i variable and I need to shift its 128 bit value of n bits, i.e. like _mm_srli_si128 and _mm_slli_si128 work, but on bits instead of bytes. What is the most efficient way of doing this? 回答1: This is the best that I could come up with for left/right immediate shifts with SSE2: #include <stdio.h> #include <emmintrin.h> #define SHL128(v, n) \ ({ \ __m128i v1, v2; \ \ if ((n) >= 64) \ { \ v1 = _mm_slli_si128(v, 8); \ v1 = _mm_slli_epi64(v1, (n) - 64); \ } \ else \ { \ v1 = _mm_slli

SSE2 integer overflow checking

扶醉桌前 提交于 2020-05-10 03:51:30
问题 When using SSE2 instructions such as PADDD (i.e., the _mm_add_epi32 intrinsic), is there a way to check whether any of the operations overflowed? I thought that maybe a flag on the MXCSR control register may get set after an overflow, but I don't see that happening. For example, _mm_getcsr() prints the same value in both cases below (8064): #include <iostream> #include <emmintrin.h> using namespace std; void main() { __m128i a = _mm_set_epi32(1, 0, 0, 0); __m128i b = _mm_add_epi32(a, a); cout

SSE2 integer overflow checking

我怕爱的太早我们不能终老 提交于 2020-05-10 03:50:31
问题 When using SSE2 instructions such as PADDD (i.e., the _mm_add_epi32 intrinsic), is there a way to check whether any of the operations overflowed? I thought that maybe a flag on the MXCSR control register may get set after an overflow, but I don't see that happening. For example, _mm_getcsr() prints the same value in both cases below (8064): #include <iostream> #include <emmintrin.h> using namespace std; void main() { __m128i a = _mm_set_epi32(1, 0, 0, 0); __m128i b = _mm_add_epi32(a, a); cout

Using % with SSE2?

烂漫一生 提交于 2020-01-23 11:03:12
问题 Here's the code I'm trying to convert to SSE2: double *pA = a; double *pB = b[voiceIndex]; double *pC = c[voiceIndex]; double *left = audioLeft; double *right = audioRight; double phase = 0.0; double bp0 = mNoteFrequency * mHostPitch; for (int sampleIndex = 0; sampleIndex < blockSize; sampleIndex++) { // some other code (that will use phase) phase += std::clamp(mRadiansPerSample * (bp0 * pB[sampleIndex] + pC[sampleIndex]), 0.0, PI); while (phase >= TWOPI) { phase -= TWOPI; } } Here's what I

Using % with SSE2?

眉间皱痕 提交于 2020-01-23 11:03:04
问题 Here's the code I'm trying to convert to SSE2: double *pA = a; double *pB = b[voiceIndex]; double *pC = c[voiceIndex]; double *left = audioLeft; double *right = audioRight; double phase = 0.0; double bp0 = mNoteFrequency * mHostPitch; for (int sampleIndex = 0; sampleIndex < blockSize; sampleIndex++) { // some other code (that will use phase) phase += std::clamp(mRadiansPerSample * (bp0 * pB[sampleIndex] + pC[sampleIndex]), 0.0, PI); while (phase >= TWOPI) { phase -= TWOPI; } } Here's what I

Using % with SSE2?

左心房为你撑大大i 提交于 2020-01-23 11:02:35
问题 Here's the code I'm trying to convert to SSE2: double *pA = a; double *pB = b[voiceIndex]; double *pC = c[voiceIndex]; double *left = audioLeft; double *right = audioRight; double phase = 0.0; double bp0 = mNoteFrequency * mHostPitch; for (int sampleIndex = 0; sampleIndex < blockSize; sampleIndex++) { // some other code (that will use phase) phase += std::clamp(mRadiansPerSample * (bp0 * pB[sampleIndex] + pC[sampleIndex]), 0.0, PI); while (phase >= TWOPI) { phase -= TWOPI; } } Here's what I

Unpacking a bitfield (Inverse of movmskb)

谁说胖子不能爱 提交于 2020-01-23 03:36:09
问题 MOVMSKB does a really nice job of packing byte fields into bits. However I want to do the reverse. I have a bit field of 16 bits that I want to put into a XMM register. 1 byte field per bit. Preferably a set bit should set the MSB (0x80) of each byte field, but I can live with a set bit resulting in a 0xFF result in the byte field. I've seen the following option on https://software.intel.com/en-us/forums/intel-isa-extensions/topic/298374: movd mm0, eax punpcklbw mm0, mm0 pshufw mm0, mm0, 0x00