sse

How to add values from vector to each other

纵饮孤独 提交于 2020-02-07 03:39:25
问题 In my code I solve integral y=x^2-4x+6 I used SSE - it allows me to operate on 4 values in one time. I made program which solve this integral with values from 0 to 5 divided to five 4-element vectors n1, n2, n3, n4. .data n1: .float 0.3125,0.625,0.9375,1.25 n2: .float 1.5625,1.875,2.1875,2.5 n3: .float 2.8125,3.12500,3.4375,3.75 n4: .float 4.0625,4.37500,4.6875,5 szostka: .float 6,6,6,6 czworka: .float 4,4,4,4 .text .global main main: movups (n1),%xmm0 mulps %xmm0,%xmm0 movups (szostka),%xmm2

SSE: not seeing a speedup by using _mm_add_epi32

China☆狼群 提交于 2020-01-30 11:53:47
问题 I would expect SSE to be faster than not using SSE. Do I need to add some additional compiler flags? Could it be that I am not seeing a speedup because this is integer code and not floating point? invocation/output $ make sum2 clang -O3 -msse -msse2 -msse3 -msse4.1 sum2.c ; ./a.out 123 n: 123 SSE Time taken: 0 seconds 124 milliseconds vector+vector:begin int: 1 5 127 0 vector+vector:end int: 0 64 66 68 NOSSE Time taken: 0 seconds 115 milliseconds vector+vector:begin int: 1 5 127 0 vector

C# - How to convert byte array of image pixels data to grayscale using vector SSE operation

偶尔善良 提交于 2020-01-24 22:12:07
问题 I have a problem with converting the image data stored in byte[] array to grayscale. I want to use vector SIMD operations because in future a need to write ASM and C++ DLL files to measure operations time. When I read about SIMD I found that SSE command is operation on 128-bit registers so there is a problem because I need to convert my byte[] array into few Vector<T> stored into List<T>. Image is four channels RGBA JPEG so I need also to know how to create vectors with R, G, B data based on

C# - How to convert byte array of image pixels data to grayscale using vector SSE operation

心不动则不痛 提交于 2020-01-24 22:12:02
问题 I have a problem with converting the image data stored in byte[] array to grayscale. I want to use vector SIMD operations because in future a need to write ASM and C++ DLL files to measure operations time. When I read about SIMD I found that SSE command is operation on 128-bit registers so there is a problem because I need to convert my byte[] array into few Vector<T> stored into List<T>. Image is four channels RGBA JPEG so I need also to know how to create vectors with R, G, B data based on

C# - How to convert byte array of image pixels data to grayscale using vector SSE operation

删除回忆录丶 提交于 2020-01-24 22:12:00
问题 I have a problem with converting the image data stored in byte[] array to grayscale. I want to use vector SIMD operations because in future a need to write ASM and C++ DLL files to measure operations time. When I read about SIMD I found that SSE command is operation on 128-bit registers so there is a problem because I need to convert my byte[] array into few Vector<T> stored into List<T>. Image is four channels RGBA JPEG so I need also to know how to create vectors with R, G, B data based on

Unpacking a bitfield (Inverse of movmskb)

谁说胖子不能爱 提交于 2020-01-23 03:36:09
问题 MOVMSKB does a really nice job of packing byte fields into bits. However I want to do the reverse. I have a bit field of 16 bits that I want to put into a XMM register. 1 byte field per bit. Preferably a set bit should set the MSB (0x80) of each byte field, but I can live with a set bit resulting in a 0xFF result in the byte field. I've seen the following option on https://software.intel.com/en-us/forums/intel-isa-extensions/topic/298374: movd mm0, eax punpcklbw mm0, mm0 pshufw mm0, mm0, 0x00

SIMD the following code

≯℡__Kan透↙ 提交于 2020-01-22 13:54:31
问题 How do I SIMIDize the following code in C (using SIMD intrinsics of course)? I am having trouble understanding SIMD intrinsics and this would help a lot: int sum_naive( int n, int *a ) { int sum = 0; for( int i = 0; i < n; i++ ) sum += a[i]; return sum; } 回答1: Here's a fairly straightforward implementation (warning: untested code): int32_t sum_array(const int32_t a[], const int n) { __m128i vsum = _mm_set1_epi32(0); // initialise vector of four partial 32 bit sums int32_t sum; int i; for (i =

Disable AVX2 functions on non-Haswell processors

狂风中的少年 提交于 2020-01-22 12:59:50
问题 I have written some AVX2 code to run on a Haswell i7 processor. The same codebase is also used on non-Haswell processors, where the same code should be replaced with their SSE equivalents. I was wondering is there a way for the compiler to ignore AVX2 instructions on non-Haswell processors. I need something like: public void useSSEorAVX(...){ IF (compiler directive detected AVX2) AVX2 code (this part is ready) ELSE SSE code (this part is also ready) } } Right now I am commenting out related

Disable AVX2 functions on non-Haswell processors

瘦欲@ 提交于 2020-01-22 12:59:06
问题 I have written some AVX2 code to run on a Haswell i7 processor. The same codebase is also used on non-Haswell processors, where the same code should be replaced with their SSE equivalents. I was wondering is there a way for the compiler to ignore AVX2 instructions on non-Haswell processors. I need something like: public void useSSEorAVX(...){ IF (compiler directive detected AVX2) AVX2 code (this part is ready) ELSE SSE code (this part is also ready) } } Right now I am commenting out related

Finding lists of prime numbers with SIMD - SSE/AVX

余生长醉 提交于 2020-01-21 05:26:05
问题 I'm curious if anyone has advice on how to use SIMD to find lists of prime numbers. Particularly I'm interested how to do this with SSE/AVX. The two algorithms I have been looking at are trial division and the Sieve of Eratosthenes. I have managed to find a way to use SSE with trial division. I found a faster way to to division which works well for a vector/scalar "Division by Invariant Integers Using Multiplication"http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556 Each time I