gpu | 易学教程

Vectorizing nested loops in matlab using bsxfun and with GPU

阅读更多关于 Vectorizing nested loops in matlab using bsxfun and with GPU

问题 For loops seem to be extremely slow, so I was wondering if the nested loops in the code shown next could be vectorized using bsxfun and maybe GPU could be introduced too. Code %// Paramaters i = 1; j = 3; n1 = 1500; n2 = 1500; %// Pre-allocate for output LInc(n1+n2,n1+n2)=0; %// Nested Loops - I for x = 1:n1 for y = 1:n1 num = ((n2 ^ 2) * (L1(i, i) + L2(j, j) + 1)) - (n2 * n * (L1(x,i) + L1(y,i))); LInc(x, y) = L1(x, y) + (num/denom); LInc(y, x) = LInc(x, y); end end %// Nested Loops - II for

Tensorflow OOM on GPU

阅读更多关于 Tensorflow OOM on GPU

问题 i'm training some Music Data on a LSTM-RNN in Tensorflow and encountered some Problem with GPU-Memory-Allocation which i don't understand: I encounter an OOM when there actually seems to be just about enough VRAM still available. Some background: I'm working on Ubuntu Gnome 16.04, using a GTX1060 6GB, Intel Xeon E3-1231V3 and 8GB RAM. So now first the part of the error-message which i can understand, in the and i will add the whole error message in the end again for anyone who might ask for

Modifying registry to increase GPU timeout, windows 7

阅读更多关于 Modifying registry to increase GPU timeout, windows 7

问题 Im trying to increase the timeout on the GPU from its default setting of 2 seconds to something a little longer. I found the following link but it appears its slightly different in windows 7 as i cant see anything mentioned in the webpage. Has anyone done this before? If so could you fill in the gaps please. Thanks @RoBik so as follows if i want 6 days (bit excessive i know but just for example)? Thanks again for your help, +1. EDIT This is the error im currently getting. An error has occured

Some problems when I try to enable L1 Cache inside GPU [closed]

阅读更多关于 Some problems when I try to enable L1 Cache inside GPU [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 2 days ago . I am a beginner in cuda, I already executed the command of enabling L1 Cache through command prompt in windows. nvcc locates in the path: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin on my device. I write the command as follows. C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin>nvcc

Parallel implementation of the computation of the sum of contiguous subsequences in an array using Cuda

阅读更多关于 Parallel implementation of the computation of the sum of contiguous subsequences in an array using Cuda

问题 lets consider the following array: tab = [80,12,14,5,70,9,26,30,8,12,16,15] I want to compute the sum of all possible sequences of size 4 using cuda : for example : S1=80+12+14+5=111 S2=12+14+5+70 =101 S3=14+5+70+9 =98 .... You have an efficient idea to parallise this task using Cuda. the previous table is just an example in my case i will use huge one. 回答1: We can do this in a single operation ( thrust::transform ) using thrust. In CUDA, this can be considered to be a fairly simple 1-D

Install dlib with cuda support ubuntu 18.04

阅读更多关于 Install dlib with cuda support ubuntu 18.04

问题 I have CUDA 9.0 and CUDNN 7.1 installed on Ubuntu 18.04(Linux mint 19). Tensorflow-gpu works fine on GPU(GTX 1080ti). Now i am trying to build dlib with CUDA support: sudo python3 setup.py install --yes USE_AVX_INSTRUCTIONS --yes DLIB_USE_CUDA --clean Got the error: user@user-pc:~/Downloads/dlib$ sudo python3 setup.py install --yes USE_AVX_INSTRUCTIONS --yes DLIB_USE_CUDA --clean running install running bdist_egg running egg_info writing dlib.egg-info/PKG-INFO writing dependency_links to dlib

How to convert a nested loop into parfor loop

阅读更多关于 How to convert a nested loop into parfor loop

问题 This is my from my MATLAB script. function [ Im ] = findBorders( I ) Im = false(size(I)); I = padarray(I, [1, 1], 1); [h w] = size(Im); bkgFound = false; for row = 1 : h for col = 1 : w if I(row + 1, col + 1) bkgFound = false; for i = 0:2 for j = 0:2 if ~I(row + i, col + j) Im(row, col) = 1; bkgFound = true; break; end; end; if bkgFound break; end; end; end; end; end; end So, I need to convert it to parfor loop, to run into GPU. I need help. I read some articles, but have no idea about how to

CUDA kernel function seems to show race conditions despite racecheck showing 0 race conditions

阅读更多关于 CUDA kernel function seems to show race conditions despite racecheck showing 0 race conditions

问题 My CUDA kernel function is not returning the intended result (a sum of all elements in vector b) but is instead returning a single value from vector b. I tried memcheck and racecheck, but nothing came up: [breecej@compute-0-32 newsum]$ cuda-memcheck mystock ========= CUDA-MEMCHECK ========= ERROR SUMMARY: 0 errors [breecej@compute-0-32 newsum]$ cuda-memcheck --tool racecheck mystock ========= CUDA-MEMCHECK ========= RACECHECK SUMMARY: 0 hazards displayed (0 errors, 0 warnings) [breecej

Enable GPU resources (CUDA) on DC/OS

阅读更多关于 Enable GPU resources (CUDA) on DC/OS

问题 I have got a cluster with gpu nodes (nvidia) and deployed DC/OS 1.8. I'd like to enable to schedule jobs (batch and spark) on gpu nodes using gpu isolation. DC/OS is based on mesos 1.0.1 that supports gpu isolation. 回答1: Unfortunately, DC/OS doesn't officially support GPUs in 1.8 ( experimental support for GPUs will be coming in the next release as mentioned here: https://github.com/dcos/dcos/pull/766 ). In this next release, only Marathon will officially be able to launch GPU services

Enable GPU resources (CUDA) on DC/OS

阅读更多关于 Enable GPU resources (CUDA) on DC/OS