How do I efficiently lookup 16bits in a 128bit SIMD vector? [duplicate]
问题 This question already has answers here : SSE/SIMD shift with one-byte element size / granularity? (2 answers) How do I vectorize data_i16[0 to 15]? (1 answer) Closed 3 days ago . I'm trying to implement the strategy described in an answer to How do I vectorize data_i16[0 to 15]? Code below. The spot I'd like to fix is the for(int i=0; i<ALIGN; i++) loop I'm new to SIMD. From what I can tell I'd load the high/low nibble table by writing const auto HI_TBL = _mm_load_si128((__m128i*)HighNibble)