Let\'s assume that we have a function that multiplies two arrays of 1000000 doubles each. In C/C++ the function looks like this:
void mul_c(double* a, double
I want to add another point of view to the problem. SIMD instructions give big performance boost if there is no memory bound restrictions. But there are too much memory loading and storing operations and too few CPU calculations in current example. So CPU is in time to process incoming data without using SIMD. If you use data of another type (32-bit float for example) or more complex algorithm, memory throughput won't restrict CPU performance and using of SIMD will give more advantages.