gpu

Access/synchronization to local memory

☆樱花仙子☆ 提交于 2020-01-17 06:22:29
问题 I'm pretty new to GPGPU programming. I'm trying to implement algorithm that needs lot of synchronization, so its using only one work-group (global and local size have the same value) I have fallowing problem: my program is working correctly till size of problem exceeds 32. __kernel void assort( __global float *array, __local float *currentOutput, __local float *stimulations, __local int *noOfValuesAdded, __local float *addedValue, __local float *positionToInsert, __local int *activatedIdx, _

Access/synchronization to local memory

狂风中的少年 提交于 2020-01-17 06:22:00
问题 I'm pretty new to GPGPU programming. I'm trying to implement algorithm that needs lot of synchronization, so its using only one work-group (global and local size have the same value) I have fallowing problem: my program is working correctly till size of problem exceeds 32. __kernel void assort( __global float *array, __local float *currentOutput, __local float *stimulations, __local int *noOfValuesAdded, __local float *addedValue, __local float *positionToInsert, __local int *activatedIdx, _

pyglet vertex list not rendered (AMD driver?)

别等时光非礼了梦想. 提交于 2020-01-16 04:21:07
问题 My machine apparently won't draw vertex lists in pyglet. The following code renders two identical shapes at different positions in the window, one using a vertex list and the other using a straight draw() . The one that's drawn directly renders fine, while the vertex list doesn't render at all. import pyglet window = pyglet.window.Window() w, h = window.get_size() vl = pyglet.graphics.vertex_list( 4, ('v2i', (100,0, 100,h, 200,h, 200,0)), ('c3B', (255,255,255, 255,0,0, 0,255,0, 0,0,255)) )

Matlab is slow when using user defined function with calculation in GPU

大兔子大兔子 提交于 2020-01-16 00:51:10
问题 When I run the code shown below, the tic/toc pair inside the function shows it takes very short time (<< 1sec) to go through all the lines. However, it actually takes around 2.3secs to get the outputs!!! I use the tic/toc pair to measure the time. tic rnn.v = 11; rnn.h = 101; rnn.o = 7; rnn.h_init = randn(1,rnn.h,'gpuArray'); rnn.W_vh = randn(rnn.v,rnn.h,'gpuArray'); rnn.W_hh = randn(rnn.h,rnn.h,'gpuArray'); rnn.W_ho = randn(rnn.h,rnn.o,'gpuArray'); inData.V = randn(10000,11,100,'gpuArray');

GPU card resets after 2 seconds

匆匆过客 提交于 2020-01-16 00:50:17
问题 I'm using an NVIDIA geforce card that gives an error after 2 seconds if I try to run some CUDA program on it. I read here that you can use the TDRlevel key in HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers . However, I don't see any such key in the registry. Does it needs to be added yourself? Have somebody else experienced this problem. If so, how did you solve it? Thanks. 回答1: I'm assuming you are using Windows Vista or later. The article you linked to contains a list

How to debug CUDA kernels in Visual studio or Parallel Nsight

核能气质少年 提交于 2020-01-15 04:41:08
问题 I have CUDA 4.1 SDK with Parallel Nsight 2.1 installed on Win7 x64. I want to debug my CUDA kernels, how to do this? Is this possible with one GPU? There is a utility Nsight Monitor. I have tried it, first disabled Timeout Detection and Recovery on Windows, and disable WPF, as Nsight monitor told me. I set a breakpoint in VS, run the code but nothing happend. Nsight monitor said i am connected. So can i debug in VS or shall i debug in Parallel Nsight? How to? Thanks a million. 回答1: If you

Unable to create 4.3 OpenGL context

耗尽温柔 提交于 2020-01-15 04:10:31
问题 When I try to run my program, it shows up the message "Unable to create 4.3 OpenGL context." According to the link, the information implies that it is the hardware problem. However, my GPU is HD7670M and I check it from wiki, it supports OpenGL 4.3. So I want to ask do you know what is coming on? Or can I change the OpenGL version through "glutInitContextVersion" function? I would greatly appreciate for any help you can offer. 回答1: AMD only has beta support for OpenGL 4.3 at present. So

How do I free all memory on GPU in XGBoost?

旧城冷巷雨未停 提交于 2020-01-14 22:42:41
问题 Here is my code: clf = xgb.XGBClassifier( tree_method = 'gpu_hist', gpu_id = 0, n_gpus = 4, random_state = 55, n_jobs = -1 ) clf.set_params(**params) clf.fit(X_train, y_train, **fit_params) I've read the answers on this question and this git issue but neither worked. I tried to delete the booster in this way: clf._Booster.__del__() gc.collect() It deletes the booster but doesn't completely free up GPU memory. I guess it's Dmatrix that is still there but I am not sure. How can I free the whole

AMD CPU versus Intel CPU openCL

纵饮孤独 提交于 2020-01-14 19:14:41
问题 With some friends we want to use openCL. For this we look to buy a new computer, but we asked us the best between AMD and Intel for use of openCL. The graphics card will be a Nvidia and we don't have choice on the graphic card, so we start to want buy an intel cpu, but after some research we figure out that may be AMD cpu are better with openCL. We didn't find benchmarks which compare the both. So here is our questions: Is AMD better than Intel with openCL? Is it a matter to have a Nvidia

How to restrict tensorflow GPU memory usage?

别来无恙 提交于 2020-01-14 14:33:53
问题 I have used tensorflow-gpu 1.13.1 in Ubuntu 18.04 with CUDA 10.0 on Nvidia GeForce RTX 2070 (Driver Version: 415.27). Code like below was used to manage tensorflow memory usage. I have about 8Gb GPU memory, so tensorflow mustn't allocate more than 1Gb of GPU memory. But when I look on memory usage with nvidia-smi command, I see, that it uses ~1.5 Gb despite the fact that I restricted memory quantity with GPUOptions. memory_config = tf.ConfigProto(gpu_options=tf.GPUOptions(per_process_gpu