CUDA: Does passing arguments to a kernel slow the kernel launch much?
CUDA beginner here. In my code i am currently launching kernels a lot of times in a loop in the host code. (Because i need synchronization between blocks). So i wondered if i might be able to optimize the kernel launch. My kernel launches look something like this: MyKernel<<<blocks,threadsperblock>>>(double_ptr, double_ptr, int N, double x); So to launch a kernel some signal obviously has to go from the CPU to the GPU, but i'm wondering if the passing of arguments make this process noticeably slower. The arguments to the kernel are the same every single time, so perhaps i could save time by