avx | 易学教程

Optimising 2D rotation

阅读更多关于 Optimising 2D rotation

问题 Given the classic formula for rotating a point in 2D space: cv::Point pt[NPOINTS]; cv::Point rotated[NPOINTS]; float angle = WHATEVER; float cosine = cos(angle); float sine = sin(angle); for (int i = 0; i < NPOINTS; i++) { rotated[i].x = pt[i].x * cosine - pt[i].y * sine; rotated[i].y = pt[i].x * sine + pt[i].y * cosine; } Given NPOINTS is 32 and the arrays are aligned, how would one go about optimising the code for SSE or AVX? Searching around here and elsewhere didn't turn up anything

Substitute a byte with another one

阅读更多关于 Substitute a byte with another one

问题 I am finding difficulties in creating a code for this seemingly easy problem. Given a packed 8 bits integer, substitute one byte with another if present. For instance, I want to substitute 0x06 with 0x01 , so I can do the following with res as the input to find 0x06 : // Bytes to be manipulated res = _mm_set_epi8(0x00, 0x03, 0x02, 0x06, 0x0F, 0x02, 0x02, 0x06, 0x0A, 0x03, 0x02, 0x06, 0x00, 0x00, 0x02, 0x06); // Target value and substitution val = _mm_set1_epi8(0x06); sub = _mm_set1_epi8(0x01)

How to use _mm256_log_ps by leveraging Intel OpenCL SVML?

阅读更多关于 How to use _mm256_log_ps by leveraging Intel OpenCL SVML?

问题 I found that _mm256_log_ps can't be used with GCC7. Most common suggestions on stackoverflow is to use ICC or leveraging OpenCL SDK. After downloading SDK and extracting RPM file, there are three .so files: __ocl_svml_l9.so, __ocl_svml_e9.so, __ocl_svml_h8.so Can someone teach me how to call _mm256_log_ps with these .so files? Thank you. 回答1: You can use the log function from the Eigen library: #include <Eigen/Core> void foo(float* data, int size) { Eigen::Map<Eigen::ArrayXf> arr(data, size);

Is it safe to compile one source with SSE2 another with AVX architecture?

阅读更多关于 Is it safe to compile one source with SSE2 another with AVX architecture?

问题 I'm using AVX intrinsics, but since for everything other than _mm256 based intrinsics MSVC generates non-vex instructions, I need to compiler the whole source code with /arch:AVX. The rest of the project is compiled with /arch:SSE2, so that it works on older CPUs and I'm manually checking if AVX is available. The source containing AVX code (compiled for AVX) includes a huge library of templates and other stuff, just to have the definitions. Is there a possibility that the compiler/linker

can't find materials about SSE2, Altivec, VMX on apple developer

阅读更多关于 can't find materials about SSE2, Altivec, VMX on apple developer

问题 as Paul. R sugguested that there are plenty of resources about SSE2 , AVX on apple developer but I couldn't find it. Could anyone helps me ? BTW, I also looking for the archive of mail-list of altivec. thanks! Intel SSE and AVX Examples and Tutorials 来源： https://stackoverflow.com/questions/22978362/cant-find-materials-about-sse2-altivec-vmx-on-apple-developer

Optimize 128x128 to 256-bit multiply for Intel AVX[SIMD] [duplicate]

阅读更多关于 Optimize 128x128 to 256-bit multiply for Intel AVX[SIMD] [duplicate]

问题 This question already has answers here : Why _umul128 works slower than scalar code for mul128x64x2 function? (1 answer) SIMD signed with unsigned multiplication for 64-bit * 64-bit to 128-bit (2 answers) Is there hardware support for 128bit integers in modern processors? (3 answers) Is there a 128 bit integer in gcc? (3 answers) Closed 3 months ago . I'm trying to implement multiplication of 128 unsigned int on two 64 unsigned integers by Intel AVX. The problem is that non vectorised version

Getting Illegal Instruction while running a basic Avx512 code

阅读更多关于 Getting Illegal Instruction while running a basic Avx512 code

问题 I am trying to learn AVX instructions and while running a basic code I recieve Illegal instruction (core dumped) The code is mentioned below and I am compiling it using g++ -mavx512f 1.cpp What exactly is the problem and how to overcome it? Thank You! #include <immintrin.h> #include<iostream> using namespace std; void add(const float a[], const float b[], float res[], int n) { int i = 0; for(; i < (n&(~0x31)) ; i+=32 ) { __m512 x = _mm512_loadu_ps( &a[i] ); __m512 y = _mm512_loadu_ps( &b[i] )

How to use AVX/SIMD with nested loops and += format?

阅读更多关于 How to use AVX/SIMD with nested loops and += format?

问题 I am writing a page rank program. I am writing a method for updating the rankings. I have successful got it working with nested for loops and also a threaded version. However I would like to instead use SIMD/AVX. This is the code I would like to change into a SIMD/AVX implementation. #define IDX(a, b) ((a * npages) + b) // 2D matrix indexing for (size_t i = 0; i < npages; i++) { temp[i] = 0.0; for (size_t j = 0; j < npages; j++) { temp[i] += P[j] * matrix_cap[IDX(i,j)]; } } For this code P[]

Dispatching SIMD instructions + SIMDPP + qmake

阅读更多关于 Dispatching SIMD instructions + SIMDPP + qmake

问题 I'm developing a QT widget that makes use of SIMD instruction sets. I've compiled 3 versions: SSE3, AVX, and AVX2(simdpp allows to switch between them by a single #define). Now, what I want is for my widget to switch automatically between these implementations, according to best supported instruction set. Guide that is provided with simdpp makes use of some makefile magic: CXXFLAGS="" test: main.o test_sse2.o test_sse3.o test_sse4_1.o test_null.o g++ $^ -o test main.o: main.cc g++ main.cc $

Integer SIMD Instruction AVX in C

阅读更多关于 Integer SIMD Instruction AVX in C

问题 I am trying to run SIMD instruction over data types int , float and double . I need multiply, add and load operation. For float and double I successfully managed to make those instructions work: _mm256_add_ps , _mm256_mul_ps and _mm256_load_ps (ending *pd for double). (Direct FMADD operation isn't supported) But for integer I couldn't find a working instruction. All of those showed at intel AVX manual give similar error by GCC 4.7 like "‘_mm256_mul_epu32’ was not declared in this scope". For