NEON vectorize sum of products of unsigned bytes: (a[i]-int1) * (b[i]-int2)
问题 I need to improve a loop, because is called by my application thousands of times. I suppose I need to do it with Neon, but I don´t know where to begin. Assumptions / pre-conditions: w is always 320 (multiple of 16/32). pa and pb are 16-byte aligned ma and mb are positive. int whileInstruction (const unsigned char *pa,const unsigned char *pb,int ma,int mb,int w) { int sum=0; do { sum += ((*pa++)-ma)*((*pb++)-mb); } while(--w); return sum; } This attempt at vectorizing it is not working well,