gpu | 易学教程

cudaMemcpyToSymbol do not copy data

阅读更多关于 cudaMemcpyToSymbol do not copy data

问题 I want to use __constant__ memory which will be accessed by all threads across all of my kernels. The declaration is something like this extern __constant__ float smooth [8 * 1024]; I am copying data to this variable using cudaMemcpyToSymbol("smooth", smooth_local, smooth_size, 0, cudaMemcpyHostToDevice); smooth_size = 7K bytes It was giving me incorrect output but when I run it in -deviceemu mode and tried to print the contents of both these variables inside the kernel, I was getting all

How do you free up gpu memory?

阅读更多关于 How do you free up gpu memory?

问题 When running theano, I get an error: not enough memory. See below. What are some possible actions that can be taken to free up memory? I know I can close applications etc, but I just want see if anyone has other ideas. For example, is it possible to reserve memory? THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python conv_exp.py Using gpu device 0: GeForce GT 650M Trying to run under a GPU. If this is not desired, then modify network3.py to set the GPU flag to False. Error allocating

Multi GPU usage with CUDA Thrust

阅读更多关于 Multi GPU usage with CUDA Thrust

问题 I want to use my two graphic cards for calculation with CUDA Thrust. I have two graphic cards. Running on single cards works well for both cards, even when I store two device_vectors in the std::vector. If I use both cards at the same time, the first cycle in the loop works and causes no error. After the first run it causes an error, probably because the device pointer is not valid. I am not sure what the exact problem is, or how to use both cards for calculation. Minimal code sample: std:

How to run TensorFlow on AMD/ATI GPU?

阅读更多关于 How to run TensorFlow on AMD/ATI GPU?

问题 After reading this tutorial https://www.tensorflow.org/guide/using_gpu I checked GPU session on this simple code import numpy as np import matplotlib.pyplot as plt import tensorflow as tf a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2,3], name = 'a') b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape = [3,2], name = 'b') c = tf.matmul(a, b) with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess: x = sess.run(c) print(x) The output was 2018-08-07 18:44:59.019144

Computation on sparse data using GPU

阅读更多关于 Computation on sparse data using GPU

问题 I'm computing a function f ( x ) = exp(- x ) in Matlab, where x is a vector of scalars. The function is computed on GPU, e.g. x_cpu = [4 5 11 1]; x = gpuArray(x_cpu); f = exp(-x); then the result would be: f = exp(-[4, 5, 11, 1]) = [0.183, 0.0067, 1.6702e-005, 0.3679]. Note that f ( x (3)) = f (11) = exp(-11) = 1.6702e-005 = 0.000016702, which is a pretty small value. So, I would like to avoid computing the function for all x (i) > 10 by simply setting f ( x (i)) = 0. I can probably use the

How to download large files (like weights of a model) from Colaboratory?

阅读更多关于 How to download large files (like weights of a model) from Colaboratory?

问题 I have tried downloading small files from google Colaboratory. They are easily downloaded but whenever I try to download files which have a large sizes it shows an error? What is the way to download large files? 回答1: This is how I handle this issue: from google.colab import auth from googleapiclient.http import MediaFileUpload from googleapiclient.discovery import build auth.authenticate_user() Then click on the link, authorize Google Drive and paste the code in the notebook. drive_service =

How can I use GPU with Java programming

阅读更多关于 How can I use GPU with Java programming

问题 I am using CUDAC all these days to access the GPU. But now my guide asked me to work with Java and GPU. So I searched in Internet and found Rootbeer is the best option for it but I am not able to understand how to run a program using 'Rootbeer'. Can some one tell me steps for using Rootbeer . 回答1: Mark Harris from Nvidia gave nice talk about the future of CUDA at SC14. You can watch it here. The main thing that may be of interest for you is the part where he talks about programming languages

DirectCompute versus OpenCL for GPU programming?

阅读更多关于 DirectCompute versus OpenCL for GPU programming?

问题 I have some (financial) tasks which should map well to GPU computing, but I'm not really sure if I should go with OpenCL or DirectCompute. I did some GPU computing, but it was a long time ago (3 years). I did it through OpenGL since there was not really any alternative back then. I've seen some OpenCL presentations and it looks really nice. I haven't seen anything about DirectCompute yet, but I expect it to also be good. I'm not interested at the moment in cross-platform compatibility, and

OpenCL not finding platforms?

阅读更多关于 OpenCL not finding platforms?

问题 I am trying to utilize the C++ API for OpenCL. I have installed my NVIDIA drivers and I have tested that I can run the simple vector addition program provided here. I can compile this program with following gcc call and the program runs without problem. gcc main.c -o vectorAddition -l OpenCL -I/usr/local/cuda-6.5/include However, I would very much prefer to use the C++ API as opposed the very verbose host files needed for C. I downloaded the C++ bindings from Khronos from here and placed the

Check GPU OpenGL Limits

阅读更多关于 Check GPU OpenGL Limits

问题 I was wondering if there is an easy way to query (programatically) the GPU OpenGL Limits for the following features: - maximum 2D texture size - maximum 3D texture size - maximum number of vertex shader attributes - maximum number of varying floats - number of texture image units (in vertex shader, and in fragment shader) - maximum number of draw buffers I need to know these numbers in advance before writing my GPU Research Project. 回答1: glGet() is your friend, with: GL_MAX_3D_TEXTURE_SIZE GL