gpu

unroll loops in an AMD OpenCL kernel

断了今生、忘了曾经 提交于 2019-12-24 14:18:31
问题 I'm trying to assess the performance differences between OpenCL for AMD .I have kernel for hough transfrom in the kernel i have two #pragma unroll statements when run the kernel not produce any speedup kernel void hough_circle(read_only image2d_t imageIn, global int* in,const int w_hough,__global int * circle) { sampler_t sampler=CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST; int gid0 = get_global_id(0); int gid1 = get_global_id(1); uint4 pixel; int x0=0,y0=0,r;

Slurm oversubscribe GPUs

爷,独闯天下 提交于 2019-12-24 13:35:39
问题 Is there a way to oversubscribe GPUs on Slurm, i.e. run multiple jobs/job steps that share one GPU? We've only found ways to oversubscribe CPUs and memory, but not GPUs. We want to run multiple job steps on the same GPU in parallel and optionally specify the GPU memory used for each step. 回答1: The easiest way of doing that is to have the GPU defined as a feature rather than as a gres so Slurm will not manage the GPUs, just make sure that job that need one land on nodes that offer one. 来源:

Error in implementing realtime camera based GPU_SURF in opencv

ⅰ亾dé卋堺 提交于 2019-12-24 12:18:40
问题 As the CPU based SURF in opencv was very slow for realtime application, we decided to use GPU_SURF, after setting up the opencv_gpu we made the following code: #include <iostream> #include <iomanip> #include <windows.h> #include "opencv2/contrib/contrib.hpp" #include "opencv2/objdetect/objdetect.hpp" #include "opencv2/highgui/highgui.hpp" #include "opencv2/imgproc/imgproc.hpp" #include "opencv2/gpu/gpu.hpp" #include "opencv2/core/core.hpp" #include "opencv2/features2d/features2d.hpp" #include

Tensorflow 1.9 for CPU, without GPU still requires cudNN - Windows

大憨熊 提交于 2019-12-24 11:59:26
问题 I am working on a Win10 machine, with python 3.6.3 and using tensorflow 1.9, pip 18.0. I did not provide an option to install tensorflow with gpu, (i.e.), according to this link1, I used pip install tensorflow and did not provide option for using GPU. However, when trying to import tensorflow, I am faced with the following error ModuleNotFoundError: No module named '_pywrap_tensorflow_internal' After following various links, link2,link3, I installed the Visual studio update 3 and also used

Why it is not possible to overlap memHtoD with GPU kernel with GTX 590

送分小仙女□ 提交于 2019-12-24 10:24:03
问题 I tested my GTX590 and GTX680 with cudaSDK "simpleStreams". The timeline results are shown as the pictures. Anyone to explain why in GTX 590 memC!pyDtoH cannot overlap with previous kernel computation which happens in GTX 680? 回答1: I get similar behavior with my GTX 480. I suspect something is wrong with Fermi ? maybe related to wddm? (using Windows 7 x64 here) I have tried many many different drivers and all of them show the same wrong behavior. You know have tested GK104 proven right and I

Metal - Namespace variable that is local to a thread?

与世无争的帅哥 提交于 2019-12-24 10:16:30
问题 I'm trying to create a Pseudo Random Number Generator (PRNG) in Metal, akin to thrust 's RNG , where every time you call the RNG within a thread it produces a different random number given a particular seed, which in this case, will be the thread_position_in_grid . I have it set up perfectly, and I get a nice uniformly random picture right now using the code I have. However, my code only works once per thread. I want to implement a next_rng() function that returns a new rng using the last

GPU Chipset Detection

徘徊边缘 提交于 2019-12-24 08:22:52
问题 Seeking most efficient method for retrieving the GPU model in Objective-C or Carbon. I want to avoid using system_profiler because it is slow, but if it comes down to that I am willing to use it, but I wanna exhaust other options first. 回答1: You could try glGetString(GL_RENDERER) from the OpenGL library. 来源: https://stackoverflow.com/questions/3171529/gpu-chipset-detection

How does one have TensorFlow not run the script unless the GPU was loaded successfully?

心不动则不痛 提交于 2019-12-24 07:44:24
问题 I have been trying to run some TensorFlow training on some machine with GPUs however, whenever I try to do so I get some type of error that seems to say it wasn't able to use the GPU for some reason (usually memory issue, or cuda issue or cudnn etc). However, since the thing TensorFlow does automatically is to just run in CPU if it can't use the GPU its been hard to tell for me if it was actually able to leverage the GPU or not. Thus, I wanted to have my script just fail/halt unless the GPU

How does one have TensorFlow not run the script unless the GPU was loaded successfully?

若如初见. 提交于 2019-12-24 07:43:04
问题 I have been trying to run some TensorFlow training on some machine with GPUs however, whenever I try to do so I get some type of error that seems to say it wasn't able to use the GPU for some reason (usually memory issue, or cuda issue or cudnn etc). However, since the thing TensorFlow does automatically is to just run in CPU if it can't use the GPU its been hard to tell for me if it was actually able to leverage the GPU or not. Thus, I wanted to have my script just fail/halt unless the GPU

How can I accelerate a sparse matrix by dense vector product, currently implemented via scipy.sparse.csc_matrix.dot, using CUDA?

旧时模样 提交于 2019-12-24 07:38:52
问题 My ultimate goal is to accelerate the computation of a matrix-vector product in Python, potentially by using a CUDA-enabled GPU. The matrix A is about 15k x 15k and sparse (density ~ 0.05), and the vector x is 15k elements and dense, and I am computing Ax. I have to perform this computation many times, so making it as fast as possible would be ideal. My current non-GPU “optimization” is to represent A as a scipy.sparse.csc_matrix object, and then simply computing A.dot(x), but I was hoping to