CUDA - Visual Profiler and Control Flow Divergence

笑着哭i 提交于 2019-12-13 03:53:33

问题


I'm profiling my CUDA application, and I've come across something that I don't understand about the "Control Flow Divergence" metric that is present in the Visual Profiler.

According to the User Guide:

Control flow divergence gives the percentage of thread instructions that were not executed by all threads in the warp, hence causing divergence.

I've got the following code in my CUDA kernel:

int var;
var = tex2D(texture, x, y); // texture fetch
if(var < 0) {
    var *= -1;
    results[(blockIdx.x*blockDim.x) + threadIdx.x] = var; // global memory array
}

Here's what happens: not a single thread enters the branch (I checked the values in global memory), but the profiler states that control flow divergence is 34%. If on that same branch I insert a printf, then the value jumps to 43% (and oddly the execution time increases as well), despite nothing happening on stdout. Does this mean that the metric takes into account all of the kernel's instructions, even the ones not executed by any thread? (so effectively not having warp divergence)

On both cases the Divergent Branches metric is 0%.


回答1:


What version are you using? It sounds like you're using an old version so it may be worth updating to a more recent version (e.g. 4.2 or 5.0 - the latter is currently a release candidate).

If you're able to update to the CUDA 5.0 the Visual Profiler then you by analysing the specific kernel you can have the profiler highlight the specific lines in your kernel that are suffering from divergence (same for non-coalesced memory accesses). You'll need to compile your code with either debug (-G) or, if you want to profile release code, with line info (-lineinfo) for this to work.



来源:https://stackoverflow.com/questions/12370325/cuda-visual-profiler-and-control-flow-divergence

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!