Shuffle instruction in CUDA not working

心已入冬 提交于 2019-12-10 15:57:33

问题


I have got problem with shuffle instruction in CUDA 5.0.

This is snippet of my kernel. It is inside the loop. Print is there only for debug purpose because I can't use ordinary debugger:

...
tex_val = tex2D(srcTexRef, threadIdx.x + w, y_pos);
if (threadIdx.x == 0)
{
    left = left_value[y_pos];
}
else
{
    printf("thread %d; shfl value: %f \n", threadIdx.x, __shfl_up(value, 1));
    left = __shfl_up(value, 1);
}

printf("thread %d; value: %f; tex_val: %f; left: %f \n", threadIdx.x, value, tex_val, left);
...

From that I get this output:

l0:  ITERATION 1
l1:  thread 0; value: 0; tex_val: 1; left: 4
l2: 
l3:  ITERATION 2
l4:  thread 1; shfl value: 0
l5:  thread 0; value: 5; tex_val: 1; left: 5
l6:  thread 1; value: 0; tex_val: 1; left: 0
l7: 
l8:  ITERATION 3
l9:  thread 1; shfl value: 0
l10: thread 2; shfl value: 1
l11: thread 0; value: 6; tex_val: 1; left: 6
l12: thread 1; value: 1; tex_val: 1; left: 0
l13: thread 2; value: 2; tex_val: 1; left: 1
...

From the output I can see that thread 1 doesn't get value from thread 0 in any iteration even though I can clearly see that it has value (line 4 - shfl value is 0; line 5 - value is 5). Thread 2 and higher can get value from lower thread. Where am I making mistake? Is it happening because of the branching?


回答1:


Yes, it's because of the branching. Quoting from the CUDA programming guide B.14.2:

The __shfl() intrinsics permit exchanging of a variable between threads within a warp without use of shared memory. The exchange occurs simultaneously for all active threads within the warp, ...

and

Threads may only read data from another thread which is actively participating in the __shfl() command. If the target thread is inactive, the retrieved value is undefined.

In a branch, active threads are those taking the same path of execution, while those taking different ones are inactive. In your case, thread 0 is inactive, so you cannot shuffle from it.



来源:https://stackoverflow.com/questions/30095899/shuffle-instruction-in-cuda-not-working

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!