How do I efficiently lookup 16bits in a 128bit SIMD vector? [duplicate]

耗尽温柔 提交于 2020-04-30 06:29:30

问题


I'm trying to implement the strategy described in an answer to How do I vectorize data_i16[0 to 15]? Code below. The spot I'd like to fix is the for(int i=0; i<ALIGN; i++) loop

I'm new to SIMD. From what I can tell I'd load the high/low nibble table by writing

const auto HI_TBL = _mm_load_si128((__m128i*)HighNibble)
const auto LO_TBL = _mm_load_si128((__m128i*)LowNibble)

My problem is the >>4 and tbl[index].

It seems like I can't do a shift on bytes (_mm_srai_epi16) so I need to convert everything to 16bits. Ok fine I can use two unpacks (_mm_unpacklo_epi8/_mm_unpackhi_epi8) with zeroes as the second param and I'll have two sets of variables to shift. However, the shuffle seems to be only available for 8bits (_mm_shuffle_epi8) AND it shuffles only 8bits while I need 16.

As you can see I'll need to do a lot of instructions so I get the feeling I'm doing this wrong. I'm also unsure how to go from 16bits (after I right shift by 4) to 8. Maybe I missed it but is there a 128bit rotate right? Then I could skip the unpack. (using vectorOf15=_mm_broadcastb_epi8(15) and _mm_and_si128(rotateResult, vectorOf15)?)

Heres a non vectorized demo below


#include <stdio.h>
#include <string.h>

typedef unsigned char u8;
typedef unsigned short u16;

typedef signed char s8;
typedef signed short s16;


#define ALIGN 16
#define ALIGN_ATTR __attribute__ ((aligned(ALIGN)))

u16 HighNibble[16] ALIGN_ATTR = {0, 0, 0, 1, 512, 0, 512, 0, 0, 0, 0, 0, 0, 0, 0, 0};
u16 LowNibble[16] ALIGN_ATTR =  {1, 513, 513, 513, 513, 513, 513, 1, 1, 1, 0, 0, 0, 0, 0, 0};

char my_input[1024*1024] ALIGN_ATTR;
u16 my_output[1024*1024] ALIGN_ATTR;

int main(int argc, char *argv[])
{
    strcpy(my_input, "09AZaz.fFgG"); //Digits will become 1 and A-F/a-f will become 512

    auto input_end = my_input+sizeof(my_input);
    auto output_end = my_output+sizeof(my_output);
    auto output = my_output;

    for(auto input=my_input; input<input_end; input+=ALIGN)
    {
        for(int i=0; i<ALIGN; i++)
        {
            auto val = input[i];
            output[i]=HighNibble[val>>4] & LowNibble[val&15];
        }
        output+=ALIGN;
    }
    for(int i=0; i<11; i++) //We only care about the first few we set using strcpy
        printf("%d\n", my_output[i]);
    return 0;
}

来源:https://stackoverflow.com/questions/61446596/how-do-i-efficiently-lookup-16bits-in-a-128bit-simd-vector

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!