I am trying to modify the imageDenosing class in CUDA SDK, I need to repeat the filter many time incase to capture the time. But my code doesn\'t work properly.
//st
Your kernel is running asynchronously - you need to wait for it to complete, e.g.
cudaMalloc((void **)&dst2, size);
cudaMemcpy(dst2, dst, imageW * imageH * sizeof(TColor), cudaMemcpyHostToDevice);
F1D<<>>(dst, imageW, imageH, dst2);
cudaThreadSynchronize(); // *** wait for kernel to complete ***
cudaFree(dst2);