opencl

How to launch custom OpenCL kernel in OpenCV (3.0.0) OCL?

对着背影说爱祢 提交于 2019-11-30 01:01:19
I'm probably misusing OpenCV by using it as wrapper to the official OpenCL C++ bindings so that I can launch my own kernels. However, OpenCV does have classes like Program, ProgramSource, Kernel, Queue, etc. that seem to tell me that I can launch my own (even non-image-based) kernels with OpenCV. I am having trouble finding documentation out there for these classes, let alone examples. So, I took a stab at it so far: #include <fstream> #include <iostream> #include "opencv2/opencv.hpp" #include "opencv2/core/ocl.hpp" #define ARRAY_SIZE 128 using namespace std; using namespace cv; int main(int,

Adreno OpenCL Application Development(1)

耗尽温柔 提交于 2019-11-29 23:48:59
一、简介 OpenCL是Khronos集团开发和维护的一个开放的、免版税的异构系统跨平台并行编程标准。它的设计有助于开发人员充分利用现代异构系统中的巨大计算能力,极大地促进跨平台的应用程序开发。 Snapdragon平台上的Qualcomm®Adrenotm GPU系列是最早完全支持OpenCL的移动GPU之一。下图为Heterogeneous系统使用OpenCL框架图: 二、OpenCL On Snapdragon Snapdragon是当今Android操作系统和物联网(IOT)市场中功能最强大、应用最广泛的移动平台之一。Snapdragon移动平台将一流的移动组件集中在一块芯片上,确保基于Snapdragon的设备以极为节能的集成解决方案提供最新的移动用户体验。 Snapdragon是一个多处理器系统,包括多模式调制解调器、CPU、GPU、DSP、定位/GPS、多媒体、电源管理、射频、软件和操作系统优化、内存、连接(Wi-Fi、蓝牙)等组件。 1、 OpenCL在Adreno A3x、A4x和A5x GPUs上完全受支持,并且完全符合OpenCL标准。OpenCL有不同的版本和配置文件,不同的Adreno GPU可能支持不同的OpenCL版本,如下图所示: Adreno GPUs with OpenCL support 除了OpenCL版本和配置文件的不同之外,Adreno

How to use C++ templates in OpenCL kernels?

て烟熏妆下的殇ゞ 提交于 2019-11-29 22:38:34
I'm a novice in OpenCL. I have an algorithm which uses templates. It worked well with OpenMP parallelization but now the amount of data has grown and the only way to process it is to rewrite it to use OpenCL. I can easily use MPI to build it for cluster but Tesla-like GPU is much cheaper than cluster :) Is there any way to use C++ templates in OpenCL kernel? Is it possible to somehow expand templates by C++ compiler or some tool and after that use so changed kernel function? EDIT. The idea of a workaround is to somehow generate C99-compatible code from C++ code from the template. I found a

OpenCL global memory fetches

一笑奈何 提交于 2019-11-29 22:18:37
问题 I am thinking about reworking my GPU OpenCL kernel to speed things up. The problem is there is a lot of global memory that is not coalesced and fetches are really bringing down the performance. So I am planning to copy as much of the global memory into local but I have to pick what to copy. Now my question is: Do many fetches of small chunks of memory hurt more than fewer fetches of larger chunks? 回答1: You can use clGetDeviceInfo to find out what the cacheline size is for a device.

When will OpenCL 1.2 for NVIDIA hardware be available?

余生长醉 提交于 2019-11-29 17:04:08
问题 I would have asked this question on the NVIDIA developer forum but since it's still down maybe someone here can tell me something. Does anybody know if there is already OpenCL 1.2 support in NVIDIAs driver? If not, is it coming soon? I don't have a GeForce 600 series card to check myself. According to Wikipedia there are already some cards that could support it though. It somewhat seems like NVIDIA does not mention OpenCL a whole lot anymore and just focuses on CUDA C/C++ (see StreamComputing

clEnqueueNDRange blocking on Nvidia hardware? (Also Multi-GPU)

好久不见. 提交于 2019-11-29 16:18:38
On Nvidia GPUs, when I call clEnqueueNDRange , the program waits for it to finish before continuing. More precisely, I'm calling its equivalent C++ binding, CommandQueue::enqueueNDRange , but this shouldn't make a difference. This only happens on Nvidia hardware (3 Tesla M2090s) remotely; on our office workstations with AMD GPUs, the call is nonblocking and returns immediately. I don't have local Nvidia hardware to test on - we used to, and I remember similar behavior then, too, but it's a bit hazy. This makes spreading the work across multiple GPUs harder. I've tried starting a new thread for

Cannot compile OpenCL application using 1.2 headers in 1.1 version

主宰稳场 提交于 2019-11-29 14:47:28
问题 I'm writing a small hello world OpenCL program using Khronos Group's cl.hpp for OpenCL 1.2 and nVidia's openCL libraries. The drivers and ICD I have support OpenCL 1.1. Since the nVidia side doesn't support 1.2 yet, I get some errors on functions required on OpenCL 1.2. On the other side, cl.hpp for OpenCL 1.2 has a flag, CL_VERSION_1_1 to be exact, to run the header in 1.1 mode, but it's not working. Anybody has similar experience or solution? Note: cl.hpp for version 1.1 works but,

CL_OUT_OF_RESOURCES for 2 millions floats with 1GB VRAM?

ぐ巨炮叔叔 提交于 2019-11-29 14:36:49
It seems like 2 million floats should be no big deal, only 8MBs of 1GB of GPU RAM. I am able to allocate that much at times and sometimes more than that with no trouble. I get CL_OUT_OF_RESOURCES when I do a clEnqueueReadBuffer, which seems odd. Am I able to sniff out where the trouble really started? OpenCL shouldn't be failing like this at clEnqueueReadBuffer right? It should be when I allocated the data right? Is there some way to get more details than just the error code? It would be cool if I could see how much VRAM was allocated when OpenCL declared CL_OUT_OF_RESOURCES. Eric Towers Not

Can I run Cuda or opencl on intel iris?

你说的曾经没有我的故事 提交于 2019-11-29 14:20:45
问题 I have a Macbook pro mid 2014 with intel iris and intel core i5 processor 16GB of RAM. I am planing to learn some ray-traced 3D. But, I am not sure, if my laptop can render fast without any nvidia's hardware. So, I would appreciate it, if someone can tell me if I can use Cuda if not, then could you please teach me in a very easy way how to enable OpenCL in after affects. I am looking for any tutorial for beginners to learn how to create or build OpenCL? 回答1: Cuda works only on nvidia hardware

How to use clCreateProgramWithBinary in OpenCL?

房东的猫 提交于 2019-11-29 14:03:17
I'm trying to just get a basic program to work using clCreateProgramWithBinary. This is so I know how to use it rather than a "true" application. I see that one of the parameters is a list of binaries. How exactly would I go about creating a binary to test with? I have some test code which creates a program from source, builds and enqueues it. Is there a binary created at some point during this process which I can feed into clCreateProgramWithBinary? Here is some of my code, just to give an idea of my overall flow. I've omitted comments and error checks for simplicity. program =