How to add an AVX2 vector horizontally 3 by 3?
问题 I have a __m256i vector containing 16x16-bit elements.I want to apply a three adjacent horizontal addition on it. In scalar mode I use the following code: unsigned short int temp[16]; __m256i sum_v;//has some values. 16 elements of 16-bit vector. | 0 | x15 | x14 | x13 | ... | x3 | x2 | x1 | _mm256_store_si256((__m256i *)&temp[0], sum_v); output1 = (temp[0] + temp[1] + temp[2]); output2 = (temp[3] + temp[4] + temp[5]); output3 = (temp[6] + temp[7] + temp[8]); output4 = (temp[9] + temp[10] +