why do we need cudaDeviceSynchronize(); in kernels with device-printf?

__global__ void helloCUDA(float f)
{
    printf("Hello thread %d, f=%f\n", threadIdx.x, f);
}

int main()
{
    helloCUDA<<<1, 5>>>(1.2345f);
    cudaDeviceSynchronize();
    return 0;
}

Why is cudaDeviceSynchronize(); at many places for example here it is not required after kernel call?

A kernel launch is asynchronous. This means it returns control to the CPU thread immediately after starting up the GPU process, before the kernel has finished executing.

So what is the next thing in the CPU thread here? Application exit.

At application exit, it's ability to send output to the standard output is terminated by the OS.

Thus the output that is generated later by the kernel has nowhere to go, and you won't see it.

On the other hand, if you use cudaDeviceSynchronize(), then the kernel is guaranteed to finish (and the output from the kernel will find a waiting standard output queue), before the application is allowed to exit.

来源：https://stackoverflow.com/questions/19193468/why-do-we-need-cudadevicesynchronize-in-kernels-with-device-printf

标签

cuda

gpu

nvidia

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!