Is there a means to do element-wise vector-vector multiplication with BLAS, GSL or any other high performance library ?
There is always std::valarray1 which defines elementwise operations that are frequently (Intel C++ /Quse-intel-optimized-headers, G++) compiled into SIMD instructions if the target supports them.
Both these compilers will also do auto-vectorization
In that case you can just write
#define N 10000
float a[N], b[N], c[N];
void f1() {
for (int i = 1; i < N; i++)
c[i] = a[i] + b[i];
}
and see it compile into vectorized code (using SSE4 e.g.)
1 Yes they are archaic and often thought of as obsolete, but in practice they are both standard and fit the task very well.