I am trying to modify the imageDenosing class in CUDA SDK, I need to repeat the filter many time incase to capture the time. But my code doesn\'t work properly.
//st
The statement
image[imageW * iy + ix] = buffer[imageW * iy + ix];
is causing the problem. You are overwriting your input image in the kernel. So depending on thread execution order, you would be further blurring parts of the image.
Also, I don't see the purpose of
cudaMemcpy(dst2, dst, imageW*imageH*sizeof(TColor),cudaMemcpyHostToDevice);
dst looks to be device memory since you have access to it in the cuda kernal.