SSE2: How To Load Data From Non-Contiguous Memory Locations?

六眼飞鱼酱① 提交于 2019-12-05 07:48:10

For this specific case, take a look at the unpack-and-interleave instructions in your instruction reference manual. It would be something like

movss xmm0, <addr1>
movss xmm1, <addr2>
unpcklps xmm0, xmm1

Also take a look at shufps, which is handy whenever you have the data you want in the wrong order.

I think it would be interesting to see how this performs using the lookup functions from Agner Fog's Vector Class Library. It's not a library you need compile and link in. It's just a collection of header files. If you drop the header files into your source code directory then the following code should compile. The code below loads 16 bytes at a time from each of the six byte arrays, extends them to 32-bit integers (because the lookup function requires that), and then gathers floats for each of the six accumulators. You could probably extend this to AVX as well. I don't know if it this will be any better in performance (it could be worse). My guess is that if there was a regular pattern it could help (in that case the gather function would be better) but in any case it's worth a try.

#include "vectorclass.h"    
int main() {
    const int n = 16*10;
    float x[256];
    char b[6][n];
    Vec4f sum[6];
    for(int i=0; i<6; i++) sum[i] = 0;
    for(int i=0; i<n; i+=16) {
        Vec4i in[6][4];
        for(int j=0; j<6; j++) {
            Vec16c b16 = Vec16uc().load(&b[j][i]);      
            Vec8s low,high;
            low = extend_low(b16);
            high = extend_high(b16);
            in[j][0] = extend_low(low);
            in[j][1] = extend_high(low);
            in[j][2] = extend_low(high);
            in[j][3] = extend_high(high);
        }
        for(int j=0; j<4; j++) {
            sum[0] += lookup<256>(in[0][j], x);
            sum[1] += lookup<256>(in[1][j], x);
            sum[2] += lookup<256>(in[2][j], x);
            sum[3] += lookup<256>(in[3][j], x);
            sum[4] += lookup<256>(in[4][j], x);
            sum[5] += lookup<256>(in[5][j], x);
        }
    }
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!