BMI for generating masks with AVX512
问题 I was inspired by this link https://www.sigarch.org/simd-instructions-considered-harmful/ to look into how AVX512 performs. My idea was that the clean up loop after the loop could be removed using the AVX512 mask operations. Here is the code I am using void daxpy2(int n, double a, const double x[], double y[]) { __m512d av = _mm512_set1_pd(a); int r = n&7, n2 = n - r; for(int i=-n2; i<0; i+=8) { __m512d yv = _mm512_loadu_pd(&y[i+n2]); __m512d xv = _mm512_loadu_pd(&x[i+n2]); yv = _mm512_fmadd