gpu | 易学教程

Multi-GPU profiling (Several CPUs , MPI/CUDA Hybrid)

阅读更多关于 Multi-GPU profiling (Several CPUs , MPI/CUDA Hybrid)

问题 I had a quick look on the forums and I don't think this question has been asked already. I am currently working with an MPI/CUDA hybrid code, made by somebody else during his PhD. Each CPU has its own GPU. My task is to gather data by running the (already working) code, and implement extra things. Turning this code into a single CPU / Multi-GPU one is not an option at the moment (later, possibly.). I would like to make use of performance profiling tools to analyse the whole thing. For now an

How to disable or change the timeout limit for the GPU under linux?

阅读更多关于 How to disable or change the timeout limit for the GPU under linux?

问题 Does anybody know how to disable or change the timeout limit for CUDA kernels under Ubuntu 12.10? (With current versions of Windows one can set the timeout limit in the registry.) Please tell me as well if there is no possibility to do this with Ubuntu. The only results of my previous search are the following: running the CUDA kernel without a graphical display is attached to the GPU splitting the kernel into smaller ones to avoid exceeding the time limit Both solutions are no option for me

How to disable or change the timeout limit for the GPU under linux?

阅读更多关于 How to disable or change the timeout limit for the GPU under linux?

nvidia-smi Volatile GPU-Utilization explanation?

阅读更多关于 nvidia-smi Volatile GPU-Utilization explanation?

问题 I know that nvidia-smi -l 1 will give the GPU usage every one second (similarly to the following). However, I would appreciate an explanation on what Volatile GPU-Util really means. Is that the number of used SMs over total SMs, or the occupancy, or something else? +-----------------------------------------------------------------------------+ | NVIDIA-SMI 367.48 Driver Version: 367.48 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M|

Compiling for Compute Capability 2.x in CUDA C for VS2010

阅读更多关于 Compiling for Compute Capability 2.x in CUDA C for VS2010

问题 I was following this: Dynamically allocating memory inside __device/global__ CUDA kernel But it still doesn't compile. error : calling a host function("_malloc_dbg") from a __device__/__global__ function("kernel") is not allowed error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA \v4.1\bin\nvcc.exe" -gencode=arch=compute_20,code=\"sm_20,compute_20\" --use-local-env --cl-version 2010 -ccbin "c:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\x86_amd64" -I"..

Cuda, CuDNN installed But Tensorflow can't use the GPU

阅读更多关于 Cuda, CuDNN installed But Tensorflow can't use the GPU

PyTorch Object Detection with GPU on Ubuntu 18.04 - RuntimeError: CUDA out of memory. Tried to allocate xx.xx MiB

阅读更多关于 PyTorch Object Detection with GPU on Ubuntu 18.04 - RuntimeError: CUDA out of memory. Tried to allocate xx.xx MiB

问题 I'm attempting to get this PyTorch person detection example: https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html running locally with a GPU, either in a Jupyter Notebook or a regular python file. I get the error in the title either way. I'm using Ubuntu 18.04. Here is a summary of the steps I've performed: 1) Stock Ubuntu 18.04 install on a Lenovo ThinkPad X1 Extreme Gen 2 with a GTX 1650 GPU. 2) Perform a standard CUDA 10.0 / cuDNN 7.4 install. I'd rather not restate all the

How to detect slow GPU on mobile device with three.js?

阅读更多关于 How to detect slow GPU on mobile device with three.js?

问题 I've define that my games is extremely slow with enabled shadows on old mobile devices (Samsung galaxy S4, IPhone 5). When I turn off shadows it's improving performance greatly. Does any one know how to detect slow GPU to turn off shadows completely on slow devices or how to improve shadow performance? I've try to use diferrent shadow.mapSize on lights and shadowMap.type on renderer and it dosen't improve performance. Some details: I use PerspectiveCamera and WebGLRenderer with render size

Session.close() doesn't free resources on GPU using tensorflow.

阅读更多关于 Session.close() doesn't free resources on GPU using tensorflow.

问题 I would like to perform pretraining of neural network using autoencoders implemented in TensorFlow. I am able to run whole network. (Using TF or Keras). the whole graph fits into GPU memory so that's fine. Problem occurs when I create more graphs (autoencoders). GPU run out of memory very quickly. Right now I have example where building second level autoencoder causes GPU out of mem. exception. So what is happening: I have implementation of autoencoders which has session as it's attribute, so

NvCplGetThermalSettings call to nvcpl.dll returns false (C++)

阅读更多关于 NvCplGetThermalSettings call to nvcpl.dll returns false (C++)

问题 I'm trying to retrieve GPU temperature information using the code below (not mine; slightly modified), but get a 'false' return when I attempt to call the .dll function, nvCplGetThermalSettings: HINSTANCE lib = LoadLibraryA("nvcpl.dll"); if(lib) { NvCplGetThermalSettings nvCplGetThermalSettings = reinterpret_cast<NvCplGetThermalSettings> (GetProcAddress(lib,"NvCplGetThermalSettings")); DWORD coreTemp,ambientTemp,upperLimit; int success = nvCplGetThermalSettings(0,&coreTemp,&ambientTemp,