sse2

#error “SSE2 instruction set not enabled” when including <emmintrin.h>

醉酒当歌 提交于 2019-12-04 13:46:14
问题 I´m trying to compile some C++ code with cmake and make that uses the include <emmintrin.h> and get the following make error: #error "SSE2 instruction set not enabled" I have an Intel Celeron Dual Core processor with a Linux (Mint) system (Kernel 3.5). According to Wikipedia the Celeron Dual Core is capable to execute SSE2 instructions and the sse2 flag is set according to /proc/cpuinfo . But the author of this question mentions a limited SSE support of the Intel Celeron. I've already tried

The best way to shift a __m128i?

柔情痞子 提交于 2019-12-04 08:39:38
I need to shift a __m128i variable, (say v), by m bits, in such a way that bits move through all of the variable (So, the resulting variable represents v*2^m). What is the best way to do this?! Note that _mm_slli_epi64 shifts v0 and v1 seperately: r0 := v0 << count r1 := v1 << count so the last bits of v0 missed, but I want to move those bits to r1. Edit: I looking for a code, faster than this (m<64): r0 = v0 << m; r1 = v0 >> (64-m); r1 ^= v1 << m; r2 = v1 >> (64-m); For compile-time constant shift counts, you can get fairly good results. Otherwise not really. This is just an SSE

How can I set __m128i without using of any SSE instruction?

二次信任 提交于 2019-12-04 04:13:54
I have many function which use the same constant __m128i values. For example: const __m128i K8 = _mm_setr_epi8(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16); const __m128i K16 = _mm_setr_epi16(1, 2, 3, 4, 5, 6, 7, 8); const __m128i K32 = _mm_setr_epi32(1, 2, 3, 4); So I want to store all these constants in an one place. But there is a problem: I perform checking of existed CPU extension in run time. If the CPU doesn't support for example SSE (or AVX) than will be a program crash during constants initialization. So is it possible to initialize these constants without using of SSE?

Detect the availability of SSE/SSE2 instruction set in Visual Studio

六眼飞鱼酱① 提交于 2019-12-03 21:36:26
问题 How can I check in code whether SSE/SSE2 is enabled or not by the Visual Studio compiler? I have tried #ifdef __SSE__ but it didn't work. 回答1: From the documentation: _M_IX86_FP Expands to a value indicating which /arch compiler option was used: 0 if /arch:IA32 was used. 1 if /arch:SSE was used. 2 if /arch:SSE2 was used. This value is the default if /arch was not specified. I don't see any mention of _SSE_ . 回答2: Some additional information on _M_IX86_FP . _M_IX86_FP is only defined for 32

How to test if your Linux Support SSE2

风流意气都作罢 提交于 2019-12-03 09:02:51
问题 Actually I have 2 questions: Is SSE2 Compatibility a CPU issue or Compiler issue? How to check if your CPU or Compiler support SSE2? I am using GCC Version: gcc (GCC) 4.5.1 When I tried to compile a code it give me this error: $ gcc -O3 -msse2 -fno-strict-aliasing -DHAVE_SSE2=1 -DMEXP=19937 -o test-sse2-M19937 test.c cc1: error: unrecognized command line option "-msse2" And cpuinfo showed this: processor : 0 vendor : GenuineIntel arch : IA-64 family : 32 model : 1 model name : Dual-Core Intel

Speeding up some SSE2 Intrinsics for color conversion

久未见 提交于 2019-12-03 07:56:32
问题 I'm trying to perform image colour conversion from YCbCr to BGRA (Don't ask about the A bit, such a headache). Anyway, this needs to perform as fast as possible, so I've written it using compiler intrinsics to take advantage of SSE2. This is my first venture into SIMD land, I'm basically a beginner and so I'm sure there's plenty I'm doing inefficiently. My arithmetic code for doing the actual colour conversion turns out to be particularly slow, and Intel's VTune is showing it up as a

How to test if your Linux Support SSE2

时光怂恿深爱的人放手 提交于 2019-12-03 00:37:32
Actually I have 2 questions: Is SSE2 Compatibility a CPU issue or Compiler issue? How to check if your CPU or Compiler support SSE2? I am using GCC Version: gcc (GCC) 4.5.1 When I tried to compile a code it give me this error: $ gcc -O3 -msse2 -fno-strict-aliasing -DHAVE_SSE2=1 -DMEXP=19937 -o test-sse2-M19937 test.c cc1: error: unrecognized command line option "-msse2" And cpuinfo showed this: processor : 0 vendor : GenuineIntel arch : IA-64 family : 32 model : 1 model name : Dual-Core Intel(R) Itanium(R) Processor 9140M revision : 1 archrev : 0 features : branchlong, 16-byte atomic ops cpu

SSE2 Compiler Error

…衆ロ難τιáo~ 提交于 2019-12-02 06:18:57
问题 I'm trying to break into SSE2 and tried the following example program: #include "stdafx.h" #include <emmintrin.h> int main(int argc, char* argv[]) { __declspec(align(16)) long mul; // multiply variable __declspec(align(16)) int t1[100000]; // temporary variable __declspec(align(16)) int t2[100000]; // temporary variable __m128i mul1, mul2; for (int j = 0; j < 100000; j++) { t1[j] = j; t2[j] = j+1; } // set temporary variables to random values _asm { mov eax, 0 label: movdqa xmm0, xmmword ptr

SSE2 Compiler Error

夙愿已清 提交于 2019-12-02 01:45:49
I'm trying to break into SSE2 and tried the following example program : #include "stdafx.h" #include <emmintrin.h> int main(int argc, char* argv[]) { __declspec(align(16)) long mul; // multiply variable __declspec(align(16)) int t1[100000]; // temporary variable __declspec(align(16)) int t2[100000]; // temporary variable __m128i mul1, mul2; for (int j = 0; j < 100000; j++) { t1[j] = j; t2[j] = j+1; } // set temporary variables to random values _asm { mov eax, 0 label: movdqa xmm0, xmmword ptr [t1+eax] movdqa xmm1, xmmword ptr [t2+eax] pmuludq xmm0, xmm1 movdqa mul1, xmm0 movdqa xmm0, xmmword

How to vectorize a distance calculation using SSE2

我只是一个虾纸丫 提交于 2019-11-30 15:42:09
A and B are vectors or length N, where N could be in the range 20 to 200 say. I want to calculate the square of the distance between these vectors, i.e. d^2 = ||A-B||^2. So far I have: float* a = ...; float* b = ...; float d2 = 0; for(int k = 0; k < N; ++k) { float d = a[k] - b[k]; d2 += d * d; } That seems to work fine, except that I have profiled my code and this is the bottleneck (more than 50% of time is spent just doing this). I am using Visual Studio 2012, on Win 7, with these optimization options: /O2 /Oi /Ot /Oy- . My understanding is that VS2012 should auto-vectorize that loop (using