nvidia | 易学教程

How to debug OpenCL on Nvidia GPUs?

阅读更多关于 How to debug OpenCL on Nvidia GPUs?

Is there any way to debug OpenCL kernels on an Nvidia GPU, i.e. set breakpoints and inspect variables? My understanding is that Nvidia's tool does not allow OpenCL debugging, and AMD's and Intel's only allow it on their own devices. gDEBugger might help you somewhat (never used it though), but other than that there isn't any tool that I know of that can set breakpoints or inspect variables inside a kernel. Perhaps try to save intermediate outputs from your kernel if it is a long kernel. Sorry I can't give you a magic solution, debugging OpenCL is just hard. 来源： https://stackoverflow.com

CUDA Runtime API error 38: no CUDA-capable device is detected

阅读更多关于 CUDA Runtime API error 38: no CUDA-capable device is detected

问题 The Situation I have a 2 gpu server (Ubuntu 12.04) where I switched a Tesla C1060 with a GTX 670. Than I installed CUDA 5.0 over the 4.2. Afterwards I compiled all examples execpt for simpleMPI without error. But when I run ./devicequery I get following error message: foo@bar-serv2:~/NVIDIA_CUDA-5.0_Samples/bin/linux/release$ ./deviceQuery ./deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) cudaGetDeviceCount returned 38 -> no CUDA-capable device is

OpenGL 3: glBindVertexArray invalidates GL_ELEMENT_ARRAY_BUFFER

阅读更多关于 OpenGL 3: glBindVertexArray invalidates GL_ELEMENT_ARRAY_BUFFER

I was certain that if you bind a buffer via glBindBuffer() , you can safely assume that it stays bound, until the target is rebound through another call to glBindBuffer() . I was therefore quite surprised when I discovered that calling glBindVertexArray() sets the buffer bound to the GL_ELEMENT_ARRAY target to 0. Here's the minimal C++ sample code: GLuint buff; glGenBuffers(1, &buff); std::cout << "Buffer is " << buff << "\n"; glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, buff); GLuint vao; glGenVertexArrays(1, &vao); GLint bound_buff; glGetIntegerv(GL_ELEMENT_ARRAY_BUFFER_BINDING, &bound_buff); std:

Is it possible to run Java3D applications on Nvidia 3D Vision hardware?

阅读更多关于 Is it possible to run Java3D applications on Nvidia 3D Vision hardware?

Is is possible to run a Java3D application on Nvidia 3D Vision hardware? I've got an existing Java3D application that can run in stereoscopic 3D. In the past, I've always run the application on Quadro cards using the OpenGL renderer and quad buffered stereo. I now have access to a laptop with the nVidia 3D Vision system (with a GeForce GTX 460M). From the documentation, it seems like it should be possible to run my application in stereo if I use the DirectX bindings and let the nVidia drivers take care of the stereo, however, this does not seem to be the case. If I run a Java3D application

Tensorflow not running on GPU

阅读更多关于 Tensorflow not running on GPU

问题 I have aldready spent a considerable of time digging around on stack overflow and else looking for the answer, but couldn't find anything Hi all, I am running Tensorflow with Keras on top. I am 90% sure I installed Tensorflow GPU, is there any way to check which install I did? I was trying to do run some CNN models from Jupyter notebook and I noticed that Keras was running the model on the CPU (checked task manager, CPU was at 100%). I tried running this code from the tensorflow website: #

Median selection in CUDA kernel

阅读更多关于 Median selection in CUDA kernel

I need to compute the median of an array of size p inside a CUDA kernel (in my case, p is small e.g. p = 10). I am using an O(p^2) algorithm for its simplicity, but at the cost of time performance. Is there a "function" to find the median efficiently that I can call inside a CUDA kernel? I know I could implement a selection algorithm, but I'm looking for a function and/or tested code. Thanks! Domi Here are a few hints: Use a better selection algorithm: QuickSelect is a faster version of QuickSort for selecting the kth element in an array. For compile-time-constant mask sizes, sorting networks

Tensorflow: GPU Utilization is almost always at 0%

阅读更多关于 Tensorflow: GPU Utilization is almost always at 0%

I'm using tensorflow with Titan-X GPUs and I've noticed that, when I run the CIFAR10 example, the Volatile GPU-utilization is pretty constant around 30%, whereas when I train my own model, the Volatile GPU-utilization is far from steady, it is almost always 0% and spikes at 80/90% before going back to 0%, over and over again. I thought that this behavior was due to the way I was feeding the data to the network (I was fetching the data after each step, which took some time). But after implementing a queue to feed the data and avoid this latency between steps, the problem persisted (see below

Maximum blocks per grid:CUDA

阅读更多关于 Maximum blocks per grid:CUDA

What is the maximum number of blocks in a grid that can created per kernel launch? I am slightly confused here since Now the compute capability table here says that there can be 65535 blocks per grid dimemsion in CUDA compute capability 2.0. Does that mean the total number of blocks = 65535*65535? Or does it mean that you can rearrange at most 65535 into a 1d grid of 65536 blocks or 2d grid of sqrt(65535) * sqrt(65535) ? Thank you. 65535 per dimension of the grid. On compute 1.x cards, 1D and 2D grids are supported. On compute 2.x cards, 3D grids are also supported, so 65535, 65535 x 65535,

Why aren't there bank conflicts in global memory for Cuda/OpenCL?

阅读更多关于 Why aren't there bank conflicts in global memory for Cuda/OpenCL?

问题 One thing I haven't figured out and google isn't helping me, is why is it possible to have bank conflicts with shared memory, but not in global memory? Can there be bank conflicts with registers? UPDATE Wow I really appreciate the two answers from Tibbit and Grizzly. It seems that I can only give a green check mark to one answer though. I am newish to stack overflow. I guess I have to pick one answer as the best. Can I do something to say thank you to the answer I don't give a green check to?

NVIDIA vs AMD: GPGPU performance

阅读更多关于 NVIDIA vs AMD: GPGPU performance

问题 I'd like to hear from people with experience of coding for both. Myself, I only have experience with NVIDIA. NVIDIA CUDA seems to be a lot more popular than the competition. (Just counting question tags on this forum, 'cuda' outperforms 'opencl' 3:1, and 'nvidia' outperforms 'ati' 15:1, and there's no tag for 'ati-stream' at all). On the other hand, according to Wikipedia, ATI/AMD cards should have a lot more potential, especially per dollar. The fastest NVIDIA card on the market as of today,