nvidia | 易学教程

Is it unsafe to run multiple tensorflow processes on the same GPU?

阅读更多关于 Is it unsafe to run multiple tensorflow processes on the same GPU?

问题 I only have one GPU (Titan X Pascal, 12 GB VRAM) and I would like to train multiple models, in parallel, on the same GPU. I tried encapsulated my model in a single python program (called model.py), and I included code in model.py to restrict VRAM usage (based on this example). I was able to run up to 3 instances of model.py concurrently on my GPU (with each instance taking a little less than 33% of my VRAM). Mysteriously, when I tried with 4 models I received an error: 2017-09-10 13:27:43

OpenCL FFT on both Nvidia and AMD hardware?

阅读更多关于 OpenCL FFT on both Nvidia and AMD hardware?

问题 I'm working on a project that needs to make use of FFTs on both Nvidia and AMD graphics cards. I initially looked for a library that would work on both (thinking this would be the OpenCL way) but I wasn't having any luck. Someone suggested to me that I would have to use each vendor's FFT implementation and write a wrapper that chose what to do based on the platform. I found AMD's implementation pretty easily, but I'm actually working with an Nvidia card in the meantime (and this is the more

GeForce Experience share feature generates whitelist errors and slows performance

阅读更多关于 GeForce Experience share feature generates whitelist errors and slows performance

I'm developing an application which is previewing video feeds from a capture card and/or a webcam. I've noticed a lot of errors in my console that look like: IGIESW [path to my.exe] found in whitelist: NO IGIWHW Game [path to my.exe] found in whitelist: NO These repeat each time I try to activate a preview window or switch the source feed I'm trying to preview. It actually takes a few seconds each time and it really kills the responsiveness of my application. I'm also seeing a similar slowdown in other applications which are previewing and switching between sources. I have two nearly identical

OpenMP offloaded target region executed in both host and target-device

阅读更多关于 OpenMP offloaded target region executed in both host and target-device

I'm working on a project which requires OpenMP offloading to Nvidia GPUs using Clang. I was able to install Clang to support offloading by following instructions mentioned here . System specification OS - Ubuntu 16.04 LTS Clang -version 4.00 Processor - Intel(R) Core(TM) i7 -4700MQ CPU Cuda -version - 9.0 Nvidia GPU - GeForce 740M (sm_capability - 35) But the problem is I when I execute a sample program to test OpenMP to Nvidia GPUs, part of the target region tends to run in GPU and then same target region starts executing in the host. Please find the sample program here, This a small C

CUDA Add Rows of a Matrix

阅读更多关于 CUDA Add Rows of a Matrix

I'm trying to add the rows of a 4800x9600 matrix together, resulting in a matrix 1x9600. What I've done is split the 4800x9600 into 9,600 matrices of length 4800 each. I then perform a reduction on the 4800 elements. The trouble is, this is really slow... Anyone got any suggestions? Basically, I'm trying to implement MATLAB's sum(...) function. Here is the code which I've verified works fine, it's just it's really slow: void reduceRows(Matrix Dresult,Matrix DA) { //split DA into chunks Matrix Dchunk; Dchunk.h=1;Dchunk.w=DA.h; cudaMalloc((void**)&Dchunk.data,Dchunk.h*Dchunk.w*sizeof(float));

Compile and build .cl file using NVIDIA's nvcc Compiler?

阅读更多关于 Compile and build .cl file using NVIDIA's nvcc Compiler?

Is it possible to compile .cl file using NVIDIA's nvcc compiler?? I am trying to set up visual studio 2010 to code Opencl under CUDA platform. But when I select CUDA C/C++ Compiler to compile and build .cl file, it gives me errors like nvcc does not exist. What is the issue? You should be able to use nvcc to compile OpenCL codes. Normally, I would suggest using a filename extension of .c for a C-compliant code, and .cpp for a C++ compliant code(*), however nvcc has filename extension override options ( -x ... ) so that we can modify the behavior. Here is a worked example using CUDA 8.0.61,

Use Vulkan VkImage as a CUDA cuArray

阅读更多关于 Use Vulkan VkImage as a CUDA cuArray

What is the correct way of using a Vulkan VkImage as a CUDA cuArray? I've been trying to follow some examples, however I get a CUDA_ERROR_INVALID_VALUE on a call to cuExternalMemoryGetMappedMipmappedArray() To provide the information in an ordered way. I'm using CUDA 10.1 Base code comes from https://github.com/SaschaWillems/Vulkan , in particular I'm using the 01 - Vulkan Gears demo, enriched with the saveScreenshot method 09 - Capturing screenshots Instead of saving the snapshot image to a file, I'll be sending the snapshot image into CUDA as a CUarray. I've enabled the following instance

A question about the details about the distribution from blocks to SMs in CUDA

阅读更多关于 A question about the details about the distribution from blocks to SMs in CUDA

Let me take the hardware with computation ability 1.3 as an example. 30 SMs are available. Then at most 240 blocks are able to be running at the same time(Considering the limit of register and shared memory, the restriction to the number of block may be much lower). Those blocks beyond 240 have to wait for available hardware resources. My question is when those blocks beyond 240 will be assigned to SMs. Once some blocks of the first 240 are completed? Or when all of the first 240 blocks are finished? I wrote such a piece of code. #include<stdio.h> #include<string.h> #include<cuda_runtime.h>

My GPU has 2 multiprocessors with 48 CUDA cores each. What does this mean?

阅读更多关于 My GPU has 2 multiprocessors with 48 CUDA cores each. What does this mean?

My GPU has 2 multiprocessors with 48 CUDA cores each. Does this mean that I can execute 96 thread blocks in parallel? No it doesn't. From chapter 4 of the CUDA C programming guide: The number of blocks and warps that can reside and be processed together on the multiprocessor for a given kernel depends on the amount of registers and shared memory used by the kernel and the amount of registers and shared memory available on the multiprocessor. There are also a maximum number of resident blocks and a maximum number of resident warps per multiprocessor. These limits as well the amount of registers

Webgl flickering in Chrome on Windows x64 with nvidia GPU

阅读更多关于 Webgl flickering in Chrome on Windows x64 with nvidia GPU

I see a weird flickering of some rendered geometry Chrome on Windows 10 x64 with nVidia chips. I've also tested in in Chrome for Linux, Firefox for both platforms, Android, and with Intel GPU. It works fine everywhere, except the one platform mentioned. Minimal example looks like this: Vertex shader: precision mediump float; smooth out vec2 pointCoord; const vec2 vertexCoord[] = vec2[]( vec2(0.0, 0.0), vec2(1.0, 0.0), vec2(1.0, 1.0), vec2(0.0, 0.0), vec2(1.0, 1.0), vec2(0.0, 1.0) ); void main() { gl_Position = vec4(vertexCoord[gl_VertexID], 0.0, 1.0); pointCoord = vertexCoord[gl_VertexID]; }