simd

SSE slower than FPU?

China☆狼群 提交于 2019-12-20 20:18:32
问题 I have a large piece of code, part of whose body contains this piece of code: result = (nx * m_Lx + ny * m_Ly + m_Lz) / sqrt(nx * nx + ny * ny + 1); which I have vectorized as follows (everything is already a float ): __m128 r = _mm_mul_ps(_mm_set_ps(ny, nx, ny, nx), _mm_set_ps(ny, nx, m_Ly, m_Lx)); __declspec(align(16)) int asInt[4] = { _mm_extract_ps(r,0), _mm_extract_ps(r,1), _mm_extract_ps(r,2), _mm_extract_ps(r,3) }; float (&res)[4] = reinterpret_cast<float (&)[4]>(asInt); result = (res

How to check if compiled code uses SSE and AVX instructions?

泪湿孤枕 提交于 2019-12-20 12:56:11
问题 I wrote some code to do a bunch of math, and it needs to go fast, so I need it to use SSE and AVX instructions. I'm compiling it using g++ with the flags -O3 and -march=native , so I think it's using SSE and AVX instructions, but I'm not sure. Most of my code looks something like the following: for(int i = 0;i<size;i++){ a[i] = b[i] * c[i]; } Is there any way I can tell if my code (after compilation) uses SSE and AVX instructions? I think I could look at the assembly to see, but I don't know

How to check if compiled code uses SSE and AVX instructions?

牧云@^-^@ 提交于 2019-12-20 12:56:00
问题 I wrote some code to do a bunch of math, and it needs to go fast, so I need it to use SSE and AVX instructions. I'm compiling it using g++ with the flags -O3 and -march=native , so I think it's using SSE and AVX instructions, but I'm not sure. Most of my code looks something like the following: for(int i = 0;i<size;i++){ a[i] = b[i] * c[i]; } Is there any way I can tell if my code (after compilation) uses SSE and AVX instructions? I think I could look at the assembly to see, but I don't know

does rewriting memcpy/memcmp/… with SIMD instructions make sense

懵懂的女人 提交于 2019-12-20 12:15:33
问题 Does rewriting memcpy/memcmp/... with SIMD instructions make sense in a large scale software? If so, why gcc doesn't generate simd instructions for these library functions by default. Also, are there any other functions can be possibly improved by SIMD? 回答1: Yes, these functions are much faster with SSE instructions. It would be nice if your runtime library/compiler instrinsics would include optimized versions, but that doesn't seem to be pervasive. I have a custom SIMD memchr which is a hell

Checking if SSE is supported at runtime [duplicate]

感情迁移 提交于 2019-12-20 10:21:15
问题 This question already has answers here : How to check if a CPU supports the SSE3 instruction set? (5 answers) cpu dispatcher for visual studio for AVX and SSE (3 answers) Closed 4 years ago . I would like to check if SSE4 or AVX is supported at runtime, so that my program may take advantage of processor specific instructions without creating a binary for each processor. If I could determine it at runtime, I could use an interface and switch between different instruction sets. 回答1: GCC has a

c++ SSE SIMD framework [closed]

浪子不回头ぞ 提交于 2019-12-20 08:40:57
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 5 years ago . Does anyone know an open-source C++ x86 SIMD intrinsics library? Intel supplies exactly what I need in their integrated performance primitives library, but I can't use that because of the copyrights all over the place. EDIT I already know the intrinsics provided by the compilers. What I need is a convenient

Ineffective remainder loop in my code

人盡茶涼 提交于 2019-12-20 07:29:00
问题 I have this function: bool interpolate(const Mat &im, float ofsx, float ofsy, float a11, float a12, float a21, float a22, Mat &res) { bool ret = false; // input size (-1 for the safe bilinear interpolation) const int width = im.cols-1; const int height = im.rows-1; // output size const int halfWidth = res.cols >> 1; const int halfHeight = res.rows >> 1; float *out = res.ptr<float>(0); const float *imptr = im.ptr<float>(0); for (int j=-halfHeight; j<=halfHeight; ++j) { const float rx = ofsx +

why is strchr twice as fast as my simd code

吃可爱长大的小学妹 提交于 2019-12-20 06:49:15
问题 I am learning SIMD and was curious to see whether it was possible to beat strchr at finding a character. It appears that strchr uses the same intrinsics but I assume that it checks for a null, whereas I know the character is in the array and plan on avoiding a null check. My code is: size_t N = 1e9; bool found = false; //Not really used ... size_t char_index1 = 0; size_t char_index2 = 0; char * str = malloc(N); memset(str,'a',N); __m256i char_match; __m256i str_simd; __m256i result; __m256i*

C++ SSE filter implementation

核能气质少年 提交于 2019-12-20 06:19:47
问题 I tried to use SSE to do 4 pixels operation. I have problem in loading the image data to __m128. My image data is a char buffer. Let say my image is 1024 x1024. My filter is 16x16. __m128 IMG_VALUES, FIL_VALUES, NEW_VALUES; //ok: IMG_VALUES=_mm_load_ps(&pInput[0]); //hang below: IMG_VALUES=_mm_load_ps(&pInput[1]); I dont know how to handle index 1,2,3... thanks. 回答1: If you really need to do this with floating point rather then integer/fixed point then you will need to load your 8 bit data,

ternary operator for clang's extended vectors

北城余情 提交于 2019-12-20 04:55:47
问题 I've tried playing with clang's extended vectors. The ternary operator is supposed to work, but it is not working for me. Example: int main() { using int4 = int __attribute__((ext_vector_type(4))); int4 a{0, 1, 3, 4}; int4 b{2, 1, 4, 5}; auto const r(a - b ? a : b); return 0; } Please provide examples on how I might make it work, like it works under OpenCL. I am using clang-3.4.2 . Error: t.cpp:8:16: error: value of type 'int __attribute__((ext_vector_type(4)))' is not contextually