_mm_extract_epi8(…) intrinsic that takes a non-literal integer as argument

◇◆丶佛笑我妖孽 提交于 2019-12-01 22:45:51

问题


I've lately been using the SSE intrinsic int _mm_extract_epi8 (__m128i src, const int ndx) that, according to the reference "extracts an integer byte from a packed integer array element selected by index". This is exactly what I want.

However, I determine the index via a _mm_cmpestri on a _m128i that performs a packed comparison of string data with explicit lengths and generates the index. The range of this index is 0..16 where 0..15 represents a valid index and 16 means that no index was found. Now to extract the integer at the index position I thought of doing the following:

const int index = _mm_cmpestri(...);
if (index >= 0 && index < 16) {
  int intAtIndex = _mm_extract_epi8(..., index);
}

This leaves us with the gcc (-O0) compiler error:

error: selector must be an integer constant in the range 0..15

A nasty way around this issue is to have a switch on the index and a _mm_extract_epi8 call for each index in range 0..15. My question is if there is a better/nicer way that I don't see.

Update: with -O3 optimization, there is no compilation error; still with -O0 though.


回答1:


Just to summarize and close the question.

We discussed 3 options to extract a byte at index i in [0..15] from a _m128i sse where i cannot be reduced to a literal at compile time:

1) Switch & _mm_extract_epi8: have a switch over i and a case for each i in [0..15] that does a _mm_extract_epi8(sse,i); works as i now is a compile-time literal.

2) Union hack: have a union SSE128i { __m128i sse; char[16] array; }, initialize it as SSE128i sse = { _mm_loadu_si128(...) } and access the byte at index i with sse.array[i].

3) Shuffle ith element to position 0 and _mm_extract_epi8: use _mm_shuffle_epi8(sse,_mm_set1_epi8(i)) to shuffle the ith element to position 0; extract it with _mm_extract_epi8(sse,0).

Evaluation: I benchmarked the three options on an Intel Sandy Bridge and a AMD Bulldozer architecture. The switch option won by a small margin. If someone's interested I can post more detailed numbers and the benchmark setup.

Update: Evaluation Benchmark setup: parse each byte of a 1GB file. For certain special bytes, increase a counter. Use _mm_cmpistri to find the index of a special byte; then "extract" the byte using one of the three methods mentioned and do a case distinction in which the counters are incremented. Code was compiled using GCC 4.6 with -std=c++0x -O3 -march=native.

For each method, the benchmark was run 25 times on a Sandy Bridge machine. Results (mean and std. dev. of running time in seconds):

Switch and extract: Mean: 1071.45 Standard deviation: 2.72006

Union hack: Mean: 1078.61 Standard deviation: 2.87131

Suffle and extract from position 0: Mean: 1079.32 Standard deviation: 2.69808

The differences are marginal. I haven't had a chance to look at the generated asm yet. Might be interesting to see the difference though. For now I can't release the full code of the benchmark as it contains non-public sources. If I have time I'll extract these and post the sources.



来源:https://stackoverflow.com/questions/12913451/mm-extract-epi8-intrinsic-that-takes-a-non-literal-integer-as-argument

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!