nvidia

How to debug OpenCL on Nvidia GPUs?

做~自己de王妃 提交于 2019-12-03 12:39:48
Is there any way to debug OpenCL kernels on an Nvidia GPU, i.e. set breakpoints and inspect variables? My understanding is that Nvidia's tool does not allow OpenCL debugging, and AMD's and Intel's only allow it on their own devices. gDEBugger might help you somewhat (never used it though), but other than that there isn't any tool that I know of that can set breakpoints or inspect variables inside a kernel. Perhaps try to save intermediate outputs from your kernel if it is a long kernel. Sorry I can't give you a magic solution, debugging OpenCL is just hard. 来源: https://stackoverflow.com

CUDA Runtime API error 38: no CUDA-capable device is detected

岁酱吖の 提交于 2019-12-03 09:51:18
问题 The Situation I have a 2 gpu server (Ubuntu 12.04) where I switched a Tesla C1060 with a GTX 670. Than I installed CUDA 5.0 over the 4.2. Afterwards I compiled all examples execpt for simpleMPI without error. But when I run ./devicequery I get following error message: foo@bar-serv2:~/NVIDIA_CUDA-5.0_Samples/bin/linux/release$ ./deviceQuery ./deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) cudaGetDeviceCount returned 38 -> no CUDA-capable device is

OpenGL 3: glBindVertexArray invalidates GL_ELEMENT_ARRAY_BUFFER

試著忘記壹切 提交于 2019-12-03 08:14:40
I was certain that if you bind a buffer via glBindBuffer() , you can safely assume that it stays bound, until the target is rebound through another call to glBindBuffer() . I was therefore quite surprised when I discovered that calling glBindVertexArray() sets the buffer bound to the GL_ELEMENT_ARRAY target to 0. Here's the minimal C++ sample code: GLuint buff; glGenBuffers(1, &buff); std::cout << "Buffer is " << buff << "\n"; glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, buff); GLuint vao; glGenVertexArrays(1, &vao); GLint bound_buff; glGetIntegerv(GL_ELEMENT_ARRAY_BUFFER_BINDING, &bound_buff); std:

Is it possible to run Java3D applications on Nvidia 3D Vision hardware?

不打扰是莪最后的温柔 提交于 2019-12-03 06:28:43
Is is possible to run a Java3D application on Nvidia 3D Vision hardware? I've got an existing Java3D application that can run in stereoscopic 3D. In the past, I've always run the application on Quadro cards using the OpenGL renderer and quad buffered stereo. I now have access to a laptop with the nVidia 3D Vision system (with a GeForce GTX 460M). From the documentation, it seems like it should be possible to run my application in stereo if I use the DirectX bindings and let the nVidia drivers take care of the stereo, however, this does not seem to be the case. If I run a Java3D application

Tensorflow not running on GPU

南楼画角 提交于 2019-12-03 05:40:28
问题 I have aldready spent a considerable of time digging around on stack overflow and else looking for the answer, but couldn't find anything Hi all, I am running Tensorflow with Keras on top. I am 90% sure I installed Tensorflow GPU, is there any way to check which install I did? I was trying to do run some CNN models from Jupyter notebook and I noticed that Keras was running the model on the CPU (checked task manager, CPU was at 100%). I tried running this code from the tensorflow website: #

Median selection in CUDA kernel

女生的网名这么多〃 提交于 2019-12-03 05:06:52
I need to compute the median of an array of size p inside a CUDA kernel (in my case, p is small e.g. p = 10). I am using an O(p^2) algorithm for its simplicity, but at the cost of time performance. Is there a "function" to find the median efficiently that I can call inside a CUDA kernel? I know I could implement a selection algorithm, but I'm looking for a function and/or tested code. Thanks! Domi Here are a few hints: Use a better selection algorithm: QuickSelect is a faster version of QuickSort for selecting the kth element in an array. For compile-time-constant mask sizes, sorting networks

Tensorflow: GPU Utilization is almost always at 0%

百般思念 提交于 2019-12-03 03:35:34
I'm using tensorflow with Titan-X GPUs and I've noticed that, when I run the CIFAR10 example, the Volatile GPU-utilization is pretty constant around 30%, whereas when I train my own model, the Volatile GPU-utilization is far from steady, it is almost always 0% and spikes at 80/90% before going back to 0%, over and over again. I thought that this behavior was due to the way I was feeding the data to the network (I was fetching the data after each step, which took some time). But after implementing a queue to feed the data and avoid this latency between steps, the problem persisted (see below

Maximum blocks per grid:CUDA

怎甘沉沦 提交于 2019-12-03 02:26:27
What is the maximum number of blocks in a grid that can created per kernel launch? I am slightly confused here since Now the compute capability table here says that there can be 65535 blocks per grid dimemsion in CUDA compute capability 2.0. Does that mean the total number of blocks = 65535*65535? Or does it mean that you can rearrange at most 65535 into a 1d grid of 65536 blocks or 2d grid of sqrt(65535) * sqrt(65535) ? Thank you. 65535 per dimension of the grid. On compute 1.x cards, 1D and 2D grids are supported. On compute 2.x cards, 3D grids are also supported, so 65535, 65535 x 65535,

Why aren't there bank conflicts in global memory for Cuda/OpenCL?

落爺英雄遲暮 提交于 2019-12-03 02:04:41
问题 One thing I haven't figured out and google isn't helping me, is why is it possible to have bank conflicts with shared memory, but not in global memory? Can there be bank conflicts with registers? UPDATE Wow I really appreciate the two answers from Tibbit and Grizzly. It seems that I can only give a green check mark to one answer though. I am newish to stack overflow. I guess I have to pick one answer as the best. Can I do something to say thank you to the answer I don't give a green check to?

NVIDIA vs AMD: GPGPU performance

半世苍凉 提交于 2019-12-03 01:47:09
问题 I'd like to hear from people with experience of coding for both. Myself, I only have experience with NVIDIA. NVIDIA CUDA seems to be a lot more popular than the competition. (Just counting question tags on this forum, 'cuda' outperforms 'opencl' 3:1, and 'nvidia' outperforms 'ati' 15:1, and there's no tag for 'ati-stream' at all). On the other hand, according to Wikipedia, ATI/AMD cards should have a lot more potential, especially per dollar. The fastest NVIDIA card on the market as of today,