gpu

Vectorizing nested loops in matlab using bsxfun and with GPU

南笙酒味 提交于 2019-12-28 16:26:01
问题 For loops seem to be extremely slow, so I was wondering if the nested loops in the code shown next could be vectorized using bsxfun and maybe GPU could be introduced too. Code %// Paramaters i = 1; j = 3; n1 = 1500; n2 = 1500; %// Pre-allocate for output LInc(n1+n2,n1+n2)=0; %// Nested Loops - I for x = 1:n1 for y = 1:n1 num = ((n2 ^ 2) * (L1(i, i) + L2(j, j) + 1)) - (n2 * n * (L1(x,i) + L1(y,i))); LInc(x, y) = L1(x, y) + (num/denom); LInc(y, x) = LInc(x, y); end end %// Nested Loops - II for

Tensorflow OOM on GPU

家住魔仙堡 提交于 2019-12-28 13:34:51
问题 i'm training some Music Data on a LSTM-RNN in Tensorflow and encountered some Problem with GPU-Memory-Allocation which i don't understand: I encounter an OOM when there actually seems to be just about enough VRAM still available. Some background: I'm working on Ubuntu Gnome 16.04, using a GTX1060 6GB, Intel Xeon E3-1231V3 and 8GB RAM. So now first the part of the error-message which i can understand, in the and i will add the whole error message in the end again for anyone who might ask for

Modifying registry to increase GPU timeout, windows 7

廉价感情. 提交于 2019-12-27 11:45:42
问题 Im trying to increase the timeout on the GPU from its default setting of 2 seconds to something a little longer. I found the following link but it appears its slightly different in windows 7 as i cant see anything mentioned in the webpage. Has anyone done this before? If so could you fill in the gaps please. Thanks @RoBik so as follows if i want 6 days (bit excessive i know but just for example)? Thanks again for your help, +1. EDIT This is the error im currently getting. An error has occured

Some problems when I try to enable L1 Cache inside GPU [closed]

非 Y 不嫁゛ 提交于 2019-12-26 08:21:08
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 2 days ago . I am a beginner in cuda, I already executed the command of enabling L1 Cache through command prompt in windows. nvcc locates in the path: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin on my device. I write the command as follows. C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin>nvcc

Parallel implementation of the computation of the sum of contiguous subsequences in an array using Cuda

一世执手 提交于 2019-12-25 18:25:10
问题 lets consider the following array: tab = [80,12,14,5,70,9,26,30,8,12,16,15] I want to compute the sum of all possible sequences of size 4 using cuda : for example : S1=80+12+14+5=111 S2=12+14+5+70 =101 S3=14+5+70+9 =98 .... You have an efficient idea to parallise this task using Cuda. the previous table is just an example in my case i will use huge one. 回答1: We can do this in a single operation ( thrust::transform ) using thrust. In CUDA, this can be considered to be a fairly simple 1-D

Install dlib with cuda support ubuntu 18.04

我是研究僧i 提交于 2019-12-25 18:23:33
问题 I have CUDA 9.0 and CUDNN 7.1 installed on Ubuntu 18.04(Linux mint 19). Tensorflow-gpu works fine on GPU(GTX 1080ti). Now i am trying to build dlib with CUDA support: sudo python3 setup.py install --yes USE_AVX_INSTRUCTIONS --yes DLIB_USE_CUDA --clean Got the error: user@user-pc:~/Downloads/dlib$ sudo python3 setup.py install --yes USE_AVX_INSTRUCTIONS --yes DLIB_USE_CUDA --clean running install running bdist_egg running egg_info writing dlib.egg-info/PKG-INFO writing dependency_links to dlib

How to convert a nested loop into parfor loop

℡╲_俬逩灬. 提交于 2019-12-25 18:12:46
问题 This is my from my MATLAB script. function [ Im ] = findBorders( I ) Im = false(size(I)); I = padarray(I, [1, 1], 1); [h w] = size(Im); bkgFound = false; for row = 1 : h for col = 1 : w if I(row + 1, col + 1) bkgFound = false; for i = 0:2 for j = 0:2 if ~I(row + i, col + j) Im(row, col) = 1; bkgFound = true; break; end; end; if bkgFound break; end; end; end; end; end; end So, I need to convert it to parfor loop, to run into GPU. I need help. I read some articles, but have no idea about how to

CUDA kernel function seems to show race conditions despite racecheck showing 0 race conditions

若如初见. 提交于 2019-12-25 16:54:11
问题 My CUDA kernel function is not returning the intended result (a sum of all elements in vector b) but is instead returning a single value from vector b. I tried memcheck and racecheck, but nothing came up: [breecej@compute-0-32 newsum]$ cuda-memcheck mystock ========= CUDA-MEMCHECK ========= ERROR SUMMARY: 0 errors [breecej@compute-0-32 newsum]$ cuda-memcheck --tool racecheck mystock ========= CUDA-MEMCHECK ========= RACECHECK SUMMARY: 0 hazards displayed (0 errors, 0 warnings) [breecej

Enable GPU resources (CUDA) on DC/OS

99封情书 提交于 2019-12-25 14:12:37
问题 I have got a cluster with gpu nodes (nvidia) and deployed DC/OS 1.8. I'd like to enable to schedule jobs (batch and spark) on gpu nodes using gpu isolation. DC/OS is based on mesos 1.0.1 that supports gpu isolation. 回答1: Unfortunately, DC/OS doesn't officially support GPUs in 1.8 ( experimental support for GPUs will be coming in the next release as mentioned here: https://github.com/dcos/dcos/pull/766 ). In this next release, only Marathon will officially be able to launch GPU services

Enable GPU resources (CUDA) on DC/OS

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-25 14:12:12
问题 I have got a cluster with gpu nodes (nvidia) and deployed DC/OS 1.8. I'd like to enable to schedule jobs (batch and spark) on gpu nodes using gpu isolation. DC/OS is based on mesos 1.0.1 that supports gpu isolation. 回答1: Unfortunately, DC/OS doesn't officially support GPUs in 1.8 ( experimental support for GPUs will be coming in the next release as mentioned here: https://github.com/dcos/dcos/pull/766 ). In this next release, only Marathon will officially be able to launch GPU services