efficient way to convert scatter indices into gather indices?

后端 未结 1 372
轮回少年
轮回少年 2020-12-11 03:57

I\'m trying to write a stream compaction (take an array and get rid of empty elements) with SIMD intrinsics. Each iteration of the loop processes 8 elements at a time (SIMD

相关标签:
1条回答
  • 2020-12-11 04:38

    If you want to emulate _mm_movemask_epi8 and you just need an 8 bit scalar mask from 8 byte elements then you can do something like this using AltiVec:

    #include <stdio.h>
    
    int main(void)
    {
        const vector unsigned char vShift = { 0, 1, 2, 3, 4, 5, 6, 7, 0, 0, 0, 0, 0, 0, 0, 0 };
                                                // constant shift vector
    
        vector unsigned char isValid = { 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
                                                // sample input
    
        vector unsigned char v1 = vec_sl(isValid, vShift);
                                                // shift input values
        vector unsigned int v2 = vec_sum4s(v1, (vector unsigned int)(0));
        vector signed int v3 = vec_sum2s((vector signed int)v2, (vector signed int)(0));
                                                // sum shifted values
        vector signed int v4 = vec_splat(v3, 1);
        unsigned int mask __attribute__ ((aligned(16)));
        vec_ste((vector unsigned int)v4, 0, &mask);
                                                // store sum in scalar
    
        printf("v1 = %vu\n", v1);
        printf("v2 = %#vlx\n", v2);
        printf("v3 = %#vlx\n", v3);
        printf("v4 = %#vlx\n", v4);
        printf("mask = %#x\n", mask);
    
        return 0;
    }
    

    This is 5 AltiVec instructions versus 1 in SSE. You might be able to lose the vec_splat and get it down to 4.

    0 讨论(0)
提交回复
热议问题