Using AVX intrinsics instead of SSE does not improve speed — why?
I've been using Intel's SSE intrinsics for quite some time with good performance gains. Hence, I expected the AVX intrinsics to further speed-up my programs. This, unfortunately, was not the case until now. Probably I am doing a stupid mistake, so I would be very grateful if somebody could help me out. I use Ubuntu 11.10 with g++ 4.6.1. I compiled my program (see below) with g++ simpleExample.cpp -O3 -march=native -o simpleExample The test system has a Intel i7-2600 CPU. Here is the code which exemplifies my problem. On my system, I get the output 98.715 ms, b[42] = 0.900038 // Naive 24.457 ms