Is the warmup code necessary when measuring CUDA kernel running time?

你。 提交于 2019-12-07 18:46:21

问题


In page 85, professional CUDA C programming:

int main()
{
    ......
    // run a warmup kernel to remove overhead
    size_t iStart,iElaps;
    cudaDeviceSynchronize();
    iStart = seconds();
    warmingup<<<grid, block>>> (d_C);
    cudaDeviceSynchronize();
    iElaps = seconds() - iStart;
    printf("warmup <<< %4d %4d >>> elapsed %d sec \n",grid.x,block.x, iElaps );

    // run kernel 1
    iStart = seconds();
    mathKernel1<<<grid, block>>>(d_C);
    cudaDeviceSynchronize();
    iElaps = seconds() - iStart;
    printf("mathKernel1 <<< %4d %4d >>> elapsed %d sec \n",grid.x,block.x,iElaps );

    // run kernel 3
    iStart = seconds();
    mathKernel2<<<grid, block>>>(d_C);
    cudaDeviceSynchronize();
    iElaps = seconds () - iStart;
    printf("mathKernel2 <<< %4d %4d >>> elapsed %d sec \n",grid.x,block.x,iElaps );

    // run kernel 3
    iStart = seconds ();
    mathKernel3<<<grid, block>>>(d_C);
    cudaDeviceSynchronize();
    iElaps = seconds () - iStart;
    printf("mathKernel3 <<< %4d %4d >>> elapsed %d sec \n",grid.x,block.x,iElaps);
    ......
}

We can see there is a warmup before measuring the running time of different kernels.

From GPU cards warming up?, I know the reason is:

If they are non-display cards, it might well be the driver shutting itself down after a period of inactivity. So what you are seeing on the first run might well be initialization overhead that only happens once.

So if my GPU card isn't inactive for a long time, e.g, I just use it to run some programs, it should not need to run any warmup code. Is my understanding right?


回答1:


Besides the GPU being in a power saving state there can be a number of other reasons why the first launch of a kernel could be slower than further runs:

  • just-in-time compilation
  • transfer of kernel to GPU memory
  • cache content
  • ...

For these reasons it is always good practice to perform at least one "warmup run" before the timed kernel run, if you are interested in the sustained speed that consecutive kernel launches achieve.

If however you have a specific application and use case in mind, it always makes sense to benchmark that application under the relevant circumstances. Be prepared though for much larger variations in runtime in that less controlled measurement.



来源:https://stackoverflow.com/questions/41098237/is-the-warmup-code-necessary-when-measuring-cuda-kernel-running-time

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!