gpu | 易学教程

Regarding GPU mode error in launching Android virtual device

阅读更多关于 Regarding GPU mode error in launching Android virtual device

问题 When I am trying to launch Android virtual device in Android Studio 2.0, it is giving me following error. ERROR: Invalid GPU mode 'mesa', use one of: on off host guest A screen shot is given bellow related to this error: Any help would be greatly appreciated!Thanks..... 回答1: Go to Virtual device manager, click 'Show on Disk' in the menu of the Virtual device. Open the config.ini file and change the respective line to: hw.gpu.mode=guest Then save the config.ini and then run Virtual device

Equivalent of cudaGetErrorString for cuBLAS?

阅读更多关于 Equivalent of cudaGetErrorString for cuBLAS?

问题 CUDA runtime has a convenience function cudaGetErrorString(cudaError_t error) that translates an error enum into a readable string. cudaGetErrorString is used in the CUDA_SAFE_CALL(someCudaFunction()) macro that many people use for CUDA error handling. I'm familiarizing myself with cuBLAS now, and I'd like to create a macro similar to CUDA_SAFE_CALL for cuBLAS. To make my macro's printouts useful, I'd like to have something analogous to cudaGetErrorString in cuBLAS. Is there an equivalent of

Compiling an OpenCL program using a CL/cl.h file

阅读更多关于 Compiling an OpenCL program using a CL/cl.h file

问题 I have sample "Hello, World!" code from the net and I want to run it on the GPU on my university's server. When I type "gcc main.c," it responds with: CL/cl.h: No such file or directory What should I do? How can I have this header file? 回答1: Make sure you have the appropriate toolkit installed. This depends on what you intend running your code on. If you have an NVidia card then you need to download and install the CUDA-toolkit which also contains the necessary binaries and libraries for

Simple CUDA Kernel Optimization

阅读更多关于 Simple CUDA Kernel Optimization

问题 In the process of speeding up an application, I have a very simple kernel which does the type casting as shown below: __global__ void UChar2FloatKernel(float *out, unsigned char *in, int nElem){ unsigned int i = (blockIdx.x * blockDim.x) + threadIdx.x; if(i<nElem) out[i] = (float) in[i]; } The global memory access is coalesced and in my understanding using shared memory will also not be beneficial as there are not multiple reads of the same memory. Does any one have any idea if there is any

CUDA: What is the threads per multiprocessor and threads per block distinction? [duplicate]

阅读更多关于 CUDA: What is the threads per multiprocessor and threads per block distinction? [duplicate]

问题 This question already has answers here : CUDA: How many concurrent threads in total? (3 answers) Closed 4 years ago . We have a workstation with two Nvidia Quadro FX 5800 cards installed. Running the deviceQuery CUDA sample reveals that the maximum threads per multiprocessor (SM) is 1024, while the maximum threads per block is 512. Given that only one block can be executed on each SM at a time, why is max threads / processor double the max threads / block? How do we utilise the other 512

CUDA atomic operation performance in different scenarios

阅读更多关于 CUDA atomic operation performance in different scenarios

问题 When I came across this question on SO, I was curious to know the answer. so I wrote below piece of code to test atomic operation performance in different scenarios. The OS is Ubuntu 12.04 with CUDA 5.5 and the device is GeForce GTX780 (Kepler architecture). I compiled the code with -O3 flag and for CC=3.5. #include <stdio.h> static void HandleError( cudaError_t err, const char *file, int line ) { if (err != cudaSuccess) { printf( "%s in %s at line %d\n", cudaGetErrorString( err ), file, line

How do I make an already written concurrent program run on a GPU array?

阅读更多关于 How do I make an already written concurrent program run on a GPU array?

问题 I have a neural network written in Erlang, and I just bought a GeForce GTX 260 card with a 240 core GPU on it. Is it trivial to use CUDA as glue to run this on the graphics card? 回答1: No, using CUDA is not a trivial matter. The CUDA programming model basically uses C (with some additions) but in order to get the most of the GPGPU's capabilities you would have to ensure that your algorithms follow the CUDA guidelines. (see NVidia CUDA Programming Guide) For example in order to get the best

How do I make an already written concurrent program run on a GPU array?

阅读更多关于 How do I make an already written concurrent program run on a GPU array?

Tensorflow multiple sessions with multiple GPUs

阅读更多关于 Tensorflow multiple sessions with multiple GPUs

问题 I have a workstation with 2 GPUs and I am trying to run multiple tensorflow jobs at the same time, so I can train more than one model at once, etc. For example, I've tried to separate the sessions into different resources via the python API using in script1.py: with tf.device("/gpu:0"): # do stuff in script2.py: with tf.device("/gpu:1"): # do stuff in script3.py with tf.device("/cpu:0"): # do stuff If I run each script by itself I can see that it is using the specified device. (Also the

How do nVIDIA CC 2.1 GPU warp schedulers issue 2 instructions at a time for a warp?

阅读更多关于 How do nVIDIA CC 2.1 GPU warp schedulers issue 2 instructions at a time for a warp?

问题 Note: This question is specific to nVIDIA Compute Capability 2.1 devices. The following information is obtained from the CUDA Programming Guide v4.1: In compute capability 2.1 devices, each SM has 48 SP (cores) for integer and floating point operations. Each warp is composed of 32 consecutive threads. Each SM has 2 warp schedulers . At every instruction issue time, one warp scheduler picks a ready warp of threads and issues 2 instructions for the warp on the cores. My doubts: One thread will