gpu | 易学教程

I want to get a Gpu's name on windows operation system with C++

阅读更多关于 I want to get a Gpu's name on windows operation system with C++

问题 I want to get a Gpu's name,for example AMD Radeon HD4830, I want to get information like "ATI Radeon HD4830".But, I read register and get the information like "ATI Radeon HD 4800 Series".And I Used D3D or OPCL's interface get the information also like "ATI Radeon HD 4800 Series".How can I get a Gpu's name correctly? 回答1: I don't remember the exact function you need to call, but you need to use the SetupDiXxx functions. Warning: it's a little painful. 回答2: You can try this with c++amp , if you

How to profile the number of global memory transactions for cuda kernels?

阅读更多关于 How to profile the number of global memory transactions for cuda kernels?

问题 How to enable profiling for "uncached_global_load_transaction" counter in cuda command-line profiler？回答1: The command line profiler is controlled using the following environment variables - COMPUTE_PROFILE: is set to either 1 or 0 (or unset) to enable or disable profiling. COMPUTE_PROFILE_CONFIG: is used to specify a config file for enabling performance counters in the GPU and various other options. COMPUTE_PROFILE_LOG: is set to the desired file path for profiling output. In your case you

Tensorflow: Gradient Calculation with sparse tensors on GPU

阅读更多关于 Tensorflow: Gradient Calculation with sparse tensors on GPU

问题 I built up a tensorflow model similar to the GPU Implementation of CIFAR10. I have a basic model that is executed on every GPU while the variables for the network are on the CPU. Everything works fine as long as I don't use sparse tensors as weight matrices in the layers. My sparse weight matrices are constructed with the function tf.sparse_to_dense() or tf.diag() . When I run it on the CPU everything works fine, but when I run it on the GPU I get the message that there is no GPU

How to select a GPU with CUDA?

阅读更多关于 How to select a GPU with CUDA?

问题 I have a computer with 2 GPUs; I wrote a CUDA C program and I need to tell it somehow that I want to run it on just 1 out of the 2 graphic cards; what is the command I need to type and how should I use it? I believe somehow that is related to the cudaSetDevice but I can't really find out how to use it. 回答1: It should be pretty much clear from documentation of cudaSetDevice, but let me provide following code snippet. bool IsGpuAvailable() { int devicesCount; cudaGetDeviceCount(&devicesCount);

Why does setting an initialization value prevent placing a variable on a GPU in TensorFlow?

阅读更多关于 Why does setting an initialization value prevent placing a variable on a GPU in TensorFlow?

问题 I get an exception when I try to run the following very simple TensorFlow code, although I virtually copied it from the documentation: import tensorflow as tf with tf.device("/gpu:0"): x = tf.Variable(0, name="x") sess = tf.Session() sess.run(x.initializer) # Bombs! The exception is: tensorflow.python.framework.errors.InvalidArgumentError: Cannot assign a device to node 'x': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is

When to use volatile with register/local variables

阅读更多关于 When to use volatile with register/local variables

问题 What is the meaning of declaring register arrays in CUDA with volatile qualifier? When I tried with volatile keyword with a register array, it removed the number of spilled register memory to local memory. (i.e. Force the CUDA to use registers instead of local memory) Is this the intended behavior? I did not find any information about the usage of volatile with regard to register arrays in CUDA documentation. Here is the ptxas -v output for both versions With volatile qualifier __volatile__

Tensorflow: GPU Acceleration only happens after first run

阅读更多关于 Tensorflow: GPU Acceleration only happens after first run

问题 I've installed CUDA and CUDNN on my machine (Ubuntu 16.04) alongside tensorflow-gpu . Versions used: CUDA 10.0, CUDNN 7.6, Python 3.6, Tensorflow 1.14 This is the output from nvidia-smi , showing the video card configuration. | NVIDIA-SMI 410.78 Driver Version: 410.78 CUDA Version: 10.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute

What does “RuntimeError: CUDA error: device-side assert triggered” in PyTorch mean?

阅读更多关于 What does “RuntimeError: CUDA error: device-side assert triggered” in PyTorch mean?

问题 I have seen a lot of specific posts to particular case-specific problems, but no fundamental motivating explanation. What does this error: RuntimeError: CUDA error: device-side assert triggered mean? Specifically, what is the assert that is being triggered, why is the assert there, and how do we work backwards to debug the problem? As-is, this error message is near useless in diagnosing any problem because of the generality that it seems to say "some code somewhere that touches the GPU" has a

Check failed: error == cudaSuccess (2 vs. 0) out of memory

阅读更多关于 Check failed: error == cudaSuccess (2 vs. 0) out of memory

问题 I am trying to run a neural network with pycaffe on gpu. This works when I call the script for the first time. When I run the same script for the second time, CUDA throws the error in the title. Batch size is 1, image size at this moment is 243x322, the gpu has 8gb RAM. I guess I am missing a command that resets the memory? Thank you very much! EDIT: Maybe I should clarify a few things: I am running caffe on windows. When i call the script with python script.py, the process terminates and the

Place Image on larger canvas size using GPU (possibly CIFilters) without using Image Context

阅读更多关于 Place Image on larger canvas size using GPU (possibly CIFilters) without using Image Context

问题 Let's say I have an Image that's 100x100. I want to place the image onto a larger canvas size that's 500x500. My current approach is to use UIGraphics to create a Context, then draw the image onto the context. UIGraphics.BeginImageContext(....); ImageView.Draw (....); That works great, but it's not as fast as I'd like it to be for what I'm doing. I noticed that CIFilters are extremely fast. Is there a way I can place an image on a larger canvas size using CIFilters, or another method that