How to use if condition in intrinsics

╄→гoц情女王★ 提交于 2019-12-18 05:23:15

问题


I want to compare two floating point variables using intrinsics. If the comparison is true, do something else do something. I want to do this as a normal if..else condition. Is there any way using intrinsics?

//normal code
vector<float> v1, v2;
for(int i = 0; i < v1.size(); ++i)
if(v1[i]<v2[i])
{
    //do something
}
else
{
    //do something
)

How to do this using SSE2 or AVX?


回答1:


SIMD conditional operations are done with branchless techniques. You use a packed-compare instruction to get a vector of elements that are all-zero or all-one.

e.g. you can conditionally add 4 to elements in an accumulator when a corresponding element matches a condition with code like:

__m128i match_counts = _mm_setzero_si128();

for (...) {
    __m128  fvec = something;
    __m128i  condition = _mm_castps_si128( _mm_cmplt_ps(fvec, _mm_setzero_ps()) );  // for elements less than zero
    __m128i masked_constant = _mm_and_si128(condition, _mm_set1_epi32(4));
    match_counts = _mm_add_epi32(match_counts, masked_constant);
}

Obviously this only works well if you can come up with a branchless way to do both sides of the branch. A blend instruction can often help.

It's likely that you won't get any speedup at all if there's too much work in each side of the branch, especially if your element size is 4 bytes or larger. (SIMD is really powerful when you're doing 16 operations in parallel on 16 separate bytes, less powerful when doing 4 operations on four 32-bit elements).




回答2:


If you expect that v1[i] < v2[i] is almost never true, almost always true, or usually stays the same for a long run (even if overall there might be no particular bias), then an other technique is also applicable which offers "true conditionality" (ie not "do both, discard one result"), a price of course, but you also get to actually skip work instead of just ignoring some results.

That technique is fairly simple, do the comparison (vectorized), gather the comparison mask with _mm_movemask_ps, and then you have 3 cases:

  • All comparisons went the same way and they were all false, execute the appropriate "do something" code that is now maybe easier to vectorize since the condition is gone.
  • All comparisons went the same way and they were all true, same.
  • Mixed, use more complicated logic. Depending on what you need, you could check all bits separately (falling back to scalar code, but now just 1 FP compare for the whole lot), or use one of the "iterate only over (un)set bits" tricks (combines well with bitscan to recover the actual index), or sometimes you can fall back to doing masking and merging as usual.

Not all 3 cases are always relevant, usually you're applying this because the predicate almost always goes the same way, making one of the "all the same" cases so rare that you can just lump it in with "mixed".

This technique is definitely not always useful. The "mixed" case is complicated and slow. The fast-path has to be common and fast enough to be worth testing whether you're can take it.

But it can be useful, maybe one of the sides is very slow and annoying, while the other side of the branch is nice simple vectorizable code that doesn't take all that long in comparison. For example, maybe the slow side has to do argument reduction for an otherwise fast approximated transcendental function, or maybe it has to normalize some vectors before taking their dot product, or orthogonalize a matrix, maybe even get data from disk..

Or, maybe neither side is exactly slow, but they evict each others data from cache (maybe both sides are a loop over an array that fits in cache, but the arrays don't fit in it together) so doing them unconditionally slows both of them down. This is probably a real thing, but I haven't seen it in the wild (yet).

Or, maybe one side cannot be executed unconditionally, doing some generally destructive things, maybe even some IO. For example if you're checking for error conditions and logging them.




回答3:


I found a document which is very useful for conditional SIMD instructions. It is a perfect solution to my question. If...else condition

Document: http://saluc.engr.uconn.edu/refs/processors/intel/sse_sse2.pdf



来源:https://stackoverflow.com/questions/38006616/how-to-use-if-condition-in-intrinsics

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!