Difference on creating a CUDA context
问题 I've a program that uses three kernels. In order to get the speedups, I was doing a dummy memory copy to create a context as follows: __global__ void warmStart(int* f) { *f = 0; } which is launched before the kernels I want to time as follows: int *dFlag = NULL; cudaMalloc( (void**)&dFlag, sizeof(int) ); warmStart<<<1, 1>>>(dFlag); Check_CUDA_Error("warmStart kernel"); I also read about other simplest ways to create a context as cudaFree(0) or cudaDevicesynchronize() . But using these API