opencl | 易学教程

OpenCl: Minimal configuration to work with AMD GPU

阅读更多关于 OpenCl: Minimal configuration to work with AMD GPU

问题 Suppose we have AMD GPU (for example Radeon HD 7970) and minimal linux system without X and etc. What should be installed and what should be launched and how it should be launched to have proper OpenCL environment? In best case it should be headless environment. Requirements to environment: GPU visible by OpenCL programs ( clinfo for example) It is possible to monitor temperature and set fan speed (for example using aticonfig ). P.S. Simple install Xserver, catalyst and run X :0 won't work

Arranging memory for OpenCL

阅读更多关于 Arranging memory for OpenCL

问题 I have about 10 numpy arrays of n items. OpenCL worker with global id i only looks at the i th element of each array. How should I arrange the memory? I was thinking of interleaving the arrays on the graphics card, but I'm not sure if this will have any performance gains since I don't understand the workgroup memory access pattern. 回答1: I'm not familiar with numpy, however if: the thread with global id i looks at i th element (as you mentioned) the data type has a proper memory alignment (4,

Hough Transform: improving algorithm efficiency over OpenCL

阅读更多关于 Hough Transform: improving algorithm efficiency over OpenCL

问题 I am trying to detect a circle in binary image using hough transform. When I use Opencv's built-in function for the circular hough transform, it is OK and I can find the circle. Now I try to write my own 'kernel' code for doing hough transform but is very very slow: kernel void hough_circle(read_only image2d_t imageIn, global int* in,const int w_hough,__global int * circle) { sampler_t sampler=CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST; int gid0 = get_global

add small image to large image for oclMat in openCV

阅读更多关于 add small image to large image for oclMat in openCV

问题 I have a frame and want to put it in on a bigger image in openCV using openCL type oclMat . But code below gives me black frame result: capture.read(fMat); // frame from camera or video oclMat f; f.upload(fMat); oclMat bf(f.rows*2, f.cols*2, f.ocltype()); // "bf"-big frame oclMat bfRoi = bf(Rect(0, 0, f.cols, f.rows)); f.copyTo(bfRoi); // something wrong here // label 1 bf.download(fMat); Mat bf2; bf.convertTo(bf2, fMat.type()); // this convert affects to nothing imshow("big frame", bf2); So

clEnqueueNDRangeKernel blocks execution

阅读更多关于 clEnqueueNDRangeKernel blocks execution

问题 Another question for me now. I've been trying to analyze the results of my kernel parallel to its execution while it's broken up to multiple calls. However, while clEnqueueReadBuffer has a boolean to determine whether it blocks or not, clEnqueueNDRangeKernel has none and I had assumed it was async always (It is being "enqueued" afterall which makes me assume that it would act like a task queue). However, when I run this block of code the outer code doesn't get executed until the kernel has

Opencl: Determine the best local_item_size

阅读更多关于 Opencl: Determine the best local_item_size

问题 My code acts like 2d matrix muliplication ( http://gpgpu-computing4.blogspot.de/2009/09/matrix-multiplication-2-opencl.html). The dimenstions of the matrixes are (1000*1000 and 10000*10000 and 100000*100000). My Hardware is: NVIDIA Corporation GM204 [GeForce GTX 980] (MAX_WORK_GROUP_SIZES: 1024 1024 64). The question is: What is the best local_item_size can I use? size_t local_item_size[2], global_item_size[2]; global_item_size[0] = number_of_points; global_item_size[1] = number_of_points;

Linker error using OpenCL 2.0 C++ bindings header file

阅读更多关于 Linker error using OpenCL 2.0 C++ bindings header file

问题 I'm getting a linker error from the OpenCL 2.0 C++ bindings header file cl2.hpp. All my headers files come directly from the Khronos OpenCL registry and I build the OpenCL.lib file myself. I don't get an error using the OpenCL 1.2 C++ bindings header file. I am using Qt 5.5.0 and Visual Studio C++ 2013 with Windows7 64-bit. The error is related to multiply defined symbols in multiple source files. mainwindow.cpp.obj:-1: error: LNK2005: "enum cl::QueueProperties __cdecl cl::operator|(enum cl:

OpenCL ND-Range boundaries?

阅读更多关于 OpenCL ND-Range boundaries?

问题 Consider a kernel which performs vector addition: __kernel void vecAdd(__global double *a, __global double *b, __global double *c, const unsigned int n) { //Get our global thread ID int id = get_global_id(0); //Make sure we do not go out of bounds if (id < n) c[id] = a[id] + b[id]; } Is it really necessary to pass the size n to the function, and do a check on the boundaries ? I have seen the same version without the check on n . Which one is correct? More generally, I wonder what happens if

OpenCL errors on long running tasks

阅读更多关于 OpenCL errors on long running tasks

问题 I'm running a long-running kernel on a nVidia Quattro 6000 device. The kernel involves a loop with tens of thousands of iterations. When I ran the kernel, after 2 seconds the screen went black, Windows restarted GPU drivers and clFinish returned an error. So I got myself a second GPU card just for displaying and now the 2 seconds timeout does not apply. The kernel computed for 50 seconds and then there were these errors (lines prefixed by "GPU ERROR" are errors printed by clCreateContext

Unexpected CPU utilization with OpenCL

阅读更多关于 Unexpected CPU utilization with OpenCL

问题 I've written a simple OpenCL kernel to calculate the cross-correlation of two images on the GPU. However, when I execute the kernel with enqueueNDRangeKernel the CPU usage of one core rises to 100%, but the host code does nothing except waiting for the enqueued command to finish. Is this normal behavior of an OpenCL program? What is going on there? OpenCL kernel (if relevant): kernel void cross_correlation(global double *f, global double *g, global double *res) { // This work item will