opencl | 易学教程

How to launch custom OpenCL kernel in OpenCV (3.0.0) OCL?

阅读更多关于 How to launch custom OpenCL kernel in OpenCV (3.0.0) OCL?

问题 I'm probably misusing OpenCV by using it as wrapper to the official OpenCL C++ bindings so that I can launch my own kernels. However, OpenCV does have classes like Program, ProgramSource, Kernel, Queue, etc. that seem to tell me that I can launch my own (even non-image-based) kernels with OpenCV. I am having trouble finding documentation out there for these classes, let alone examples. So, I took a stab at it so far: #include <fstream> #include <iostream> #include "opencv2/opencv.hpp"

OpenCL dynamic parallelism / GPU-spawned threads?

阅读更多关于 OpenCL dynamic parallelism / GPU-spawned threads?

CUDA 5 has just been released and with it the ability to spawn GPU threads from within another GPU (main?) thread, minimising callouts between CPU and GPU that we've seen thus far. What plans are there to support GPU-spawned threads in the OpenCL arena? As I cannot afford to opt for a closed standard (my user base is "everygamer"), I need to know when OpenCL is ready for prime time in this regard. OpenCL Standard is usually the way back of CUDA (except for device partitioning feature) and I guess this feature will be added to OpenCL in a year. EDIT on Aug 8, 2013: This feature has been

how to profile sequential launched multiple OpenCL kernels by one clFinish?

阅读更多关于 how to profile sequential launched multiple OpenCL kernels by one clFinish?

I have multiple kernels, and they are launched in sequential manner like this: clEnqueueNDRangeKernel(..., kernel1, ...); clEnqueueNDRangeKernel(..., kernel2, ...); clEnqueueNDRangeKernel(..., kernel3, ...); and, multiple kernels share one global buffer. Now, I profile every kernel execution and sum them up to count total execution time by adding the code block after clEnqueueNDRangeKernel: clFinish(cmdQueue); status = clGetEventProfilingInfo(...,&starttime,...); clGetEventProfilingInfo(...,&endtime,...); time_spent = endtime - starttime; My questions is that how to profile three kernels all

How to use 2 OpenCL runtimes

阅读更多关于 How to use 2 OpenCL runtimes

I want to use 2 OpenCL runtimes in one system together (in my case AMD and Nvidia, but the question is pretty generic). I know that I can compile my program with any SDK. But when running the program, I need to provide libOpenCL.so. How can I provide the libs of both runtimes so that I see 3 devices (AMD CPU, AMD GPU, Nvidia GPU) in my OpenCL program? I know that it must be possible somehow, but I didn't find a description on how to do it for linux, yet. Thanks a lot, Tomas The Smith and Thomas answers are correct; this is just expanding on that information: When you enumerate the OpenCL

Create local array dynamic inside OpenCL kernel

阅读更多关于 Create local array dynamic inside OpenCL kernel

I have a OpenCL kernel that needs to process a array as multiple arrays where each sub-array sum is saved in a local cache array. For example, imagine the fowling array: [[1, 2, 3, 4], [10, 30, 1, 23]] Each work-group gets a array (in the exemple we have 2 work-groups); Each work-item process two array indexes (for example multiply the value index the local_id), where the work-item result is saved in a work-group shared array. __kernel void test(__global int **values, __global int *result, const int array_size){ __local int cache[array_size]; // initialise if (get_local_id(0) == 0){ for (int i

Transfer data from Mat/oclMat to cl_mem (OpenCV + OpenCL)

阅读更多关于 Transfer data from Mat/oclMat to cl_mem (OpenCV + OpenCL)

I am working on a project that needs a lot of OpenCL code. I am using OpenCV's ocl module to develop my project faster but there are some functions not implemented and I will have to write my own OpenCL code. My question is this: what is the quickest and cheapest way to transfer data from Mat and/or oclMat to a cl_mem array. Re-wording this, is there a good way to transfer or enqueue (clEnqueueWriteBuffer) data from oclMat or Mat? Currently, I am using a for-loop to read data from Mat (or download from oclMat and then use for-loops) and then enqueuing it. This is turning out to be costly,

What do work items execute when conditionals are used in GPU programming?

阅读更多关于 What do work items execute when conditionals are used in GPU programming?

问题 If you have work items executing in a wavefront and there is a conditional such as: if(x){ ... } else{ .... } What do the work-items execute? is it the case whereby all workitems in the wavefront will execute the first branch (i.e. x == true ). If there are no work-items for which x is false, then the rest of the conditional is skipped? What happens if one work-item takes the alternative path. Am I told that all workitems will execute the alternate path as well (therefore executing both paths

include headers to OpenCL .cl file

阅读更多关于 include headers to OpenCL .cl file

I've written an OpenCL kernel in a .cl file. It attempts to #include several headers. Its compilation fails, since the included header files are "not found". I am aware that clBuildProgram can take the -I dir option, which adds the directory dir to the list of directories to be searched for the header files. In the khronus site forum this post http://www.khronos.org/message_boards/viewtopic.php?f=37&t=2535 talks about the issue. They propose to use clCreateProgramWithSource which specifies all sources (including .h files). I have a questions regarding this issue: Which option is better? (

How to debug OpenCL on Nvidia GPUs?

阅读更多关于 How to debug OpenCL on Nvidia GPUs?

Is there any way to debug OpenCL kernels on an Nvidia GPU, i.e. set breakpoints and inspect variables? My understanding is that Nvidia's tool does not allow OpenCL debugging, and AMD's and Intel's only allow it on their own devices. gDEBugger might help you somewhat (never used it though), but other than that there isn't any tool that I know of that can set breakpoints or inspect variables inside a kernel. Perhaps try to save intermediate outputs from your kernel if it is a long kernel. Sorry I can't give you a magic solution, debugging OpenCL is just hard. 来源： https://stackoverflow.com

Running OpenCL on hardware from mixed vendors

阅读更多关于 Running OpenCL on hardware from mixed vendors

I've been playing with the ATI OpenCL implementation in their Stream 2.0 beta. The OpenCL in the current beta only uses the CPU for now, the next version is supposed to support GPU kernels. I downloaded Stream because I have an ATI GPU in my work machine. I write software that would benefit hugely from gains by using the GPU. However this software runs on customer machines, I don't have the luxury (as many scientific computing environments have) to choose the exact hardware to develop for, and optimize for that. So my question is, if I distribute the ATI OpenCL implementation with my