I\'ve started writing a new CUDA application. However I hit a funny detour along the way. Calling the first cudaMalloc on a variable x, fails the first time. However when I call
The very first call to any of the cuda library functions launches an initialisation subroutine. It can happen that somehow the initialisation fails and not the cudaMalloc itself. (CUDA Programming Guide, section 3.2.1)
Somehow, later, however it seems it works, despite the initial failure. I don't know your setting and your code so I can't really help you further. Check the Programming Guide!