gpu | 易学教程

unroll loops in an AMD OpenCL kernel

阅读更多关于 unroll loops in an AMD OpenCL kernel

问题 I'm trying to assess the performance differences between OpenCL for AMD .I have kernel for hough transfrom in the kernel i have two #pragma unroll statements when run the kernel not produce any speedup kernel void hough_circle(read_only image2d_t imageIn, global int* in,const int w_hough,__global int * circle) { sampler_t sampler=CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST; int gid0 = get_global_id(0); int gid1 = get_global_id(1); uint4 pixel; int x0=0,y0=0,r;

Slurm oversubscribe GPUs

阅读更多关于 Slurm oversubscribe GPUs

问题 Is there a way to oversubscribe GPUs on Slurm, i.e. run multiple jobs/job steps that share one GPU? We've only found ways to oversubscribe CPUs and memory, but not GPUs. We want to run multiple job steps on the same GPU in parallel and optionally specify the GPU memory used for each step. 回答1: The easiest way of doing that is to have the GPU defined as a feature rather than as a gres so Slurm will not manage the GPUs, just make sure that job that need one land on nodes that offer one. 来源：

Error in implementing realtime camera based GPU_SURF in opencv

阅读更多关于 Error in implementing realtime camera based GPU_SURF in opencv

问题 As the CPU based SURF in opencv was very slow for realtime application, we decided to use GPU_SURF, after setting up the opencv_gpu we made the following code: #include <iostream> #include <iomanip> #include <windows.h> #include "opencv2/contrib/contrib.hpp" #include "opencv2/objdetect/objdetect.hpp" #include "opencv2/highgui/highgui.hpp" #include "opencv2/imgproc/imgproc.hpp" #include "opencv2/gpu/gpu.hpp" #include "opencv2/core/core.hpp" #include "opencv2/features2d/features2d.hpp" #include

Tensorflow 1.9 for CPU, without GPU still requires cudNN - Windows

阅读更多关于 Tensorflow 1.9 for CPU, without GPU still requires cudNN - Windows

问题 I am working on a Win10 machine, with python 3.6.3 and using tensorflow 1.9, pip 18.0. I did not provide an option to install tensorflow with gpu, (i.e.), according to this link1, I used pip install tensorflow and did not provide option for using GPU. However, when trying to import tensorflow, I am faced with the following error ModuleNotFoundError: No module named '_pywrap_tensorflow_internal' After following various links, link2,link3, I installed the Visual studio update 3 and also used

Why it is not possible to overlap memHtoD with GPU kernel with GTX 590

阅读更多关于 Why it is not possible to overlap memHtoD with GPU kernel with GTX 590

问题 I tested my GTX590 and GTX680 with cudaSDK "simpleStreams". The timeline results are shown as the pictures. Anyone to explain why in GTX 590 memC!pyDtoH cannot overlap with previous kernel computation which happens in GTX 680? 回答1: I get similar behavior with my GTX 480. I suspect something is wrong with Fermi ? maybe related to wddm? (using Windows 7 x64 here) I have tried many many different drivers and all of them show the same wrong behavior. You know have tested GK104 proven right and I

Metal - Namespace variable that is local to a thread?

阅读更多关于 Metal - Namespace variable that is local to a thread?

问题 I'm trying to create a Pseudo Random Number Generator (PRNG) in Metal, akin to thrust 's RNG , where every time you call the RNG within a thread it produces a different random number given a particular seed, which in this case, will be the thread_position_in_grid . I have it set up perfectly, and I get a nice uniformly random picture right now using the code I have. However, my code only works once per thread. I want to implement a next_rng() function that returns a new rng using the last

GPU Chipset Detection

阅读更多关于 GPU Chipset Detection

问题 Seeking most efficient method for retrieving the GPU model in Objective-C or Carbon. I want to avoid using system_profiler because it is slow, but if it comes down to that I am willing to use it, but I wanna exhaust other options first. 回答1: You could try glGetString(GL_RENDERER) from the OpenGL library. 来源： https://stackoverflow.com/questions/3171529/gpu-chipset-detection

How does one have TensorFlow not run the script unless the GPU was loaded successfully?

阅读更多关于 How does one have TensorFlow not run the script unless the GPU was loaded successfully?

问题 I have been trying to run some TensorFlow training on some machine with GPUs however, whenever I try to do so I get some type of error that seems to say it wasn't able to use the GPU for some reason (usually memory issue, or cuda issue or cudnn etc). However, since the thing TensorFlow does automatically is to just run in CPU if it can't use the GPU its been hard to tell for me if it was actually able to leverage the GPU or not. Thus, I wanted to have my script just fail/halt unless the GPU

How does one have TensorFlow not run the script unless the GPU was loaded successfully?

阅读更多关于 How does one have TensorFlow not run the script unless the GPU was loaded successfully?

How can I accelerate a sparse matrix by dense vector product, currently implemented via scipy.sparse.csc_matrix.dot, using CUDA?

阅读更多关于 How can I accelerate a sparse matrix by dense vector product, currently implemented via scipy.sparse.csc_matrix.dot, using CUDA?

问题 My ultimate goal is to accelerate the computation of a matrix-vector product in Python, potentially by using a CUDA-enabled GPU. The matrix A is about 15k x 15k and sparse (density ~ 0.05), and the vector x is 15k elements and dense, and I am computing Ax. I have to perform this computation many times, so making it as fast as possible would be ideal. My current non-GPU “optimization” is to represent A as a scipy.sparse.csc_matrix object, and then simply computing A.dot(x), but I was hoping to