gpu

Multi-GPU profiling (Several CPUs , MPI/CUDA Hybrid)

你离开我真会死。 提交于 2020-01-10 19:58:10
问题 I had a quick look on the forums and I don't think this question has been asked already. I am currently working with an MPI/CUDA hybrid code, made by somebody else during his PhD. Each CPU has its own GPU. My task is to gather data by running the (already working) code, and implement extra things. Turning this code into a single CPU / Multi-GPU one is not an option at the moment (later, possibly.). I would like to make use of performance profiling tools to analyse the whole thing. For now an

How to disable or change the timeout limit for the GPU under linux?

孤街浪徒 提交于 2020-01-09 10:23:07
问题 Does anybody know how to disable or change the timeout limit for CUDA kernels under Ubuntu 12.10? (With current versions of Windows one can set the timeout limit in the registry.) Please tell me as well if there is no possibility to do this with Ubuntu. The only results of my previous search are the following: running the CUDA kernel without a graphical display is attached to the GPU splitting the kernel into smaller ones to avoid exceeding the time limit Both solutions are no option for me

How to disable or change the timeout limit for the GPU under linux?

♀尐吖头ヾ 提交于 2020-01-09 10:22:51
问题 Does anybody know how to disable or change the timeout limit for CUDA kernels under Ubuntu 12.10? (With current versions of Windows one can set the timeout limit in the registry.) Please tell me as well if there is no possibility to do this with Ubuntu. The only results of my previous search are the following: running the CUDA kernel without a graphical display is attached to the GPU splitting the kernel into smaller ones to avoid exceeding the time limit Both solutions are no option for me

nvidia-smi Volatile GPU-Utilization explanation?

时光总嘲笑我的痴心妄想 提交于 2020-01-09 03:03:48
问题 I know that nvidia-smi -l 1 will give the GPU usage every one second (similarly to the following). However, I would appreciate an explanation on what Volatile GPU-Util really means. Is that the number of used SMs over total SMs, or the occupancy, or something else? +-----------------------------------------------------------------------------+ | NVIDIA-SMI 367.48 Driver Version: 367.48 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M|

Compiling for Compute Capability 2.x in CUDA C for VS2010

不问归期 提交于 2020-01-07 02:27:06
问题 I was following this: Dynamically allocating memory inside __device/global__ CUDA kernel But it still doesn't compile. error : calling a host function("_malloc_dbg") from a __device__/__global__ function("kernel") is not allowed error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA \v4.1\bin\nvcc.exe" -gencode=arch=compute_20,code=\"sm_20,compute_20\" --use-local-env --cl-version 2010 -ccbin "c:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\x86_amd64" -I"..

Cuda, CuDNN installed But Tensorflow can't use the GPU

岁酱吖の 提交于 2020-01-06 18:31:44
问题 My system is Ubuntu 14.04 on EC2.: nvidia-smi Sun Oct 2 13:35:28 2016 +------------------------------------------------------+ | NVIDIA-SMI 352.63 Driver Version: 352.63 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GRID K520 Off | 0000:00:03.0

PyTorch Object Detection with GPU on Ubuntu 18.04 - RuntimeError: CUDA out of memory. Tried to allocate xx.xx MiB

倾然丶 夕夏残阳落幕 提交于 2020-01-06 05:37:34
问题 I'm attempting to get this PyTorch person detection example: https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html running locally with a GPU, either in a Jupyter Notebook or a regular python file. I get the error in the title either way. I'm using Ubuntu 18.04. Here is a summary of the steps I've performed: 1) Stock Ubuntu 18.04 install on a Lenovo ThinkPad X1 Extreme Gen 2 with a GTX 1650 GPU. 2) Perform a standard CUDA 10.0 / cuDNN 7.4 install. I'd rather not restate all the

How to detect slow GPU on mobile device with three.js?

我的未来我决定 提交于 2020-01-06 05:16:31
问题 I've define that my games is extremely slow with enabled shadows on old mobile devices (Samsung galaxy S4, IPhone 5). When I turn off shadows it's improving performance greatly. Does any one know how to detect slow GPU to turn off shadows completely on slow devices or how to improve shadow performance? I've try to use diferrent shadow.mapSize on lights and shadowMap.type on renderer and it dosen't improve performance. Some details: I use PerspectiveCamera and WebGLRenderer with render size

Session.close() doesn't free resources on GPU using tensorflow.

落花浮王杯 提交于 2020-01-06 02:00:25
问题 I would like to perform pretraining of neural network using autoencoders implemented in TensorFlow. I am able to run whole network. (Using TF or Keras). the whole graph fits into GPU memory so that's fine. Problem occurs when I create more graphs (autoencoders). GPU run out of memory very quickly. Right now I have example where building second level autoencoder causes GPU out of mem. exception. So what is happening: I have implementation of autoencoders which has session as it's attribute, so

NvCplGetThermalSettings call to nvcpl.dll returns false (C++)

谁说我不能喝 提交于 2020-01-05 17:41:02
问题 I'm trying to retrieve GPU temperature information using the code below (not mine; slightly modified), but get a 'false' return when I attempt to call the .dll function, nvCplGetThermalSettings: HINSTANCE lib = LoadLibraryA("nvcpl.dll"); if(lib) { NvCplGetThermalSettings nvCplGetThermalSettings = reinterpret_cast<NvCplGetThermalSettings> (GetProcAddress(lib,"NvCplGetThermalSettings")); DWORD coreTemp,ambientTemp,upperLimit; int success = nvCplGetThermalSettings(0,&coreTemp,&ambientTemp,