gpu

cudaMemcpyToSymbol do not copy data

可紊 提交于 2019-12-07 16:59:23
问题 I want to use __constant__ memory which will be accessed by all threads across all of my kernels. The declaration is something like this extern __constant__ float smooth [8 * 1024]; I am copying data to this variable using cudaMemcpyToSymbol("smooth", smooth_local, smooth_size, 0, cudaMemcpyHostToDevice); smooth_size = 7K bytes It was giving me incorrect output but when I run it in -deviceemu mode and tried to print the contents of both these variables inside the kernel, I was getting all

How do you free up gpu memory?

99封情书 提交于 2019-12-07 16:21:17
问题 When running theano, I get an error: not enough memory. See below. What are some possible actions that can be taken to free up memory? I know I can close applications etc, but I just want see if anyone has other ideas. For example, is it possible to reserve memory? THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python conv_exp.py Using gpu device 0: GeForce GT 650M Trying to run under a GPU. If this is not desired, then modify network3.py to set the GPU flag to False. Error allocating

Multi GPU usage with CUDA Thrust

。_饼干妹妹 提交于 2019-12-07 15:20:44
问题 I want to use my two graphic cards for calculation with CUDA Thrust. I have two graphic cards. Running on single cards works well for both cards, even when I store two device_vectors in the std::vector. If I use both cards at the same time, the first cycle in the loop works and causes no error. After the first run it causes an error, probably because the device pointer is not valid. I am not sure what the exact problem is, or how to use both cards for calculation. Minimal code sample: std:

How to run TensorFlow on AMD/ATI GPU?

僤鯓⒐⒋嵵緔 提交于 2019-12-07 14:08:24
问题 After reading this tutorial https://www.tensorflow.org/guide/using_gpu I checked GPU session on this simple code import numpy as np import matplotlib.pyplot as plt import tensorflow as tf a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2,3], name = 'a') b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape = [3,2], name = 'b') c = tf.matmul(a, b) with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess: x = sess.run(c) print(x) The output was 2018-08-07 18:44:59.019144

Computation on sparse data using GPU

不羁的心 提交于 2019-12-07 13:30:57
问题 I'm computing a function f ( x ) = exp(- x ) in Matlab, where x is a vector of scalars. The function is computed on GPU, e.g. x_cpu = [4 5 11 1]; x = gpuArray(x_cpu); f = exp(-x); then the result would be: f = exp(-[4, 5, 11, 1]) = [0.183, 0.0067, 1.6702e-005, 0.3679]. Note that f ( x (3)) = f (11) = exp(-11) = 1.6702e-005 = 0.000016702, which is a pretty small value. So, I would like to avoid computing the function for all x (i) > 10 by simply setting f ( x (i)) = 0. I can probably use the

How to download large files (like weights of a model) from Colaboratory?

﹥>﹥吖頭↗ 提交于 2019-12-07 13:13:39
问题 I have tried downloading small files from google Colaboratory. They are easily downloaded but whenever I try to download files which have a large sizes it shows an error? What is the way to download large files? 回答1: This is how I handle this issue: from google.colab import auth from googleapiclient.http import MediaFileUpload from googleapiclient.discovery import build auth.authenticate_user() Then click on the link, authorize Google Drive and paste the code in the notebook. drive_service =

How can I use GPU with Java programming

折月煮酒 提交于 2019-12-07 12:14:38
问题 I am using CUDAC all these days to access the GPU. But now my guide asked me to work with Java and GPU. So I searched in Internet and found Rootbeer is the best option for it but I am not able to understand how to run a program using 'Rootbeer'. Can some one tell me steps for using Rootbeer . 回答1: Mark Harris from Nvidia gave nice talk about the future of CUDA at SC14. You can watch it here. The main thing that may be of interest for you is the part where he talks about programming languages

DirectCompute versus OpenCL for GPU programming?

为君一笑 提交于 2019-12-07 11:43:06
问题 I have some (financial) tasks which should map well to GPU computing, but I'm not really sure if I should go with OpenCL or DirectCompute. I did some GPU computing, but it was a long time ago (3 years). I did it through OpenGL since there was not really any alternative back then. I've seen some OpenCL presentations and it looks really nice. I haven't seen anything about DirectCompute yet, but I expect it to also be good. I'm not interested at the moment in cross-platform compatibility, and

OpenCL not finding platforms?

孤人 提交于 2019-12-07 11:17:25
问题 I am trying to utilize the C++ API for OpenCL. I have installed my NVIDIA drivers and I have tested that I can run the simple vector addition program provided here. I can compile this program with following gcc call and the program runs without problem. gcc main.c -o vectorAddition -l OpenCL -I/usr/local/cuda-6.5/include However, I would very much prefer to use the C++ API as opposed the very verbose host files needed for C. I downloaded the C++ bindings from Khronos from here and placed the

Check GPU OpenGL Limits

泪湿孤枕 提交于 2019-12-07 10:03:44
问题 I was wondering if there is an easy way to query (programatically) the GPU OpenGL Limits for the following features: - maximum 2D texture size - maximum 3D texture size - maximum number of vertex shader attributes - maximum number of varying floats - number of texture image units (in vertex shader, and in fragment shader) - maximum number of draw buffers I need to know these numbers in advance before writing my GPU Research Project. 回答1: glGet() is your friend, with: GL_MAX_3D_TEXTURE_SIZE GL