Is branch divergence really so bad?

前端 未结 1 582
南旧
南旧 2020-12-04 18:07

I\'ve seen many questions scattered across the Internet about branch divergence, and how to avoid it. However, even after reading dozens of articles on how CUDA wor

相关标签:
1条回答
  • 2020-12-04 18:53

    You're assuming (at least it's the example you give and the only reference you make) that the only way to avoid branch divergence is to allow all threads to execute all the code.

    In that case I agree there's not much difference.

    But avoiding branch divergence probably has more to do with algorithm re-structuring at a higher level than just the addition or removal of some if statements and making code "safe" to execute in all threads.

    I'll offer up one example. Suppose I know that odd threads will need to handle the blue component of a pixel and even threads will need to handle the green component:

    #define N 2 // number of pixel components
    #define BLUE 0
    #define GREEN 1
    // pixel order: px0BL px0GR px1BL px1GR ...
    
    
    if (threadIdx.x & 1)  foo(pixel(N*threadIdx.x+BLUE));
    else                  bar(pixel(N*threadIdx.x+GREEN));
    

    This means that every alternate thread is taking a given path, whether it be foo or bar. So now my warp takes twice as long to execute.

    However, if I rearrange my pixel data so that the color components are contiguous perhaps in chunks of 32 pixels: BL0 BL1 BL2 ... GR0 GR1 GR2 ...

    I can write similar code:

    if (threadIdx.x & 32)  foo(pixel(threadIdx.x));
    else                   bar(pixel(threadIdx.x));
    

    It still looks like I have the possibility for divergence. But since the divergence happens on warp boundaries, a give warp executes either the if path or the else path, so no actual divergence occurs.

    This is a trivial example, and probably stupid, but it illustrates that there may be ways to work around warp divergence that don't involve running all the code of all the divergent paths.

    0 讨论(0)
提交回复
热议问题