opencl | 易学教程

Is it possible to run the sum computation in parallel in OpenCL?

阅读更多关于 Is it possible to run the sum computation in parallel in OpenCL?

问题 I am a newbie in OpenCL. However, I understand the C/C++ basics and the OOP. My question is as follows: is it somehow possible to run the sum computation task in parallel? Is it theoretically possible? Below I will describe what I've tried to do: The task is, for example: double* values = new double[1000]; //let's pretend it has some random values inside double sum = 0.0; for(int i = 0; i < 1000; i++) { sum += values[i]; } What I tried to do in OpenCL kernel (and I feel it is wrong because

Variable in OpenCL kernel 'for-loop' reduces performance

阅读更多关于 Variable in OpenCL kernel 'for-loop' reduces performance

I have a for-loop in my kernel that I had hard-coded to iterate for a fixed number of loops of my code: for (int kk = 0; kk < 50000; kk++) { <... my code here ...> } I don't think the code in the loop is relevant to my question, it's some pretty simple table look-ups and integer math. I wanted to make my kernel code a little more flexible so I modified the loop so that the number of iterations of my loop (50000) is replaced with a kernel input parameter 'num_loops'. for (int kk = 0; kk < num_loops; kk++) { <... more code here ...> } The thing I found is that even when my host program calls the

CL_INVALID_WORK_GROUP_SIZE error

阅读更多关于 CL_INVALID_WORK_GROUP_SIZE error

问题 I have this code, for which I already posted something some time ago. Today I got my kernel running with a typedef struct in a little test program, but clEnqueueNDRangeKernel gives an invalid work group size error. This can have 3 causes, according to the khronos webiste. Global work size is not divisable by the local work size. In my code, it is divisable. Local work size is bigger than the GPU can handle. My local worksize is 128, way under the reported maximum of 1024. Something to do with

Installed beignet to use OpenCL on Intel, but OpenCL programs only work when run as root

阅读更多关于 Installed beignet to use OpenCL on Intel, but OpenCL programs only work when run as root

I have an Intel HD graphics 4000 3rd Gen Processor, and my OS is Linux Mint 17.1 64 bit. I installed beignet to be able to use OpenCL and thus run programs on the GPU. I had been having lots of problems using the pyOpenCL bindings, so I just decided to uninstall my current beignet version and install the latest one (You can see the previous question I asked and answered myself about it here ). Upgrading beignet worked and I can now run OpenCL code on my GPU through python and C/C++ bindings. However, I can only run the programs as root, otherwise they don't detect my GPU as a valid device. The

libOpenCL.so uses VFP register arguments, output does not

阅读更多关于 libOpenCL.so uses VFP register arguments, output does not

currently I am trying to build Buddhabrot for ARM architecture but I am stuck at one point when I get the following error. I hope somebody can help. libOpenCL.so uses VFP register arguments, output does not libGAL.so uses VFP register arguments, output does not here's my makefile LIBS = -lm -lOpenCL -lGAL -lGL -lGLEW -lglut -lpthread CFLAGS = -Wall -g OBJECTS = main.o environment.o input.o animate.o buddhabrot.o buddhacl.o cmodules/timer.o all: prog prog: $(OBJECTS) c++ $(CFLAGS) -o prog $(OBJECTS) $(LIBS) %.o: %.cpp $(LIBS) clean: rm -f *.o prog cmodules/*.o c++ -v output Using built-in specs

使用OpenCL提升OpenCV图像处理性能 | speed up opencv image processing with OpenCL

阅读更多关于使用OpenCL提升OpenCV图像处理性能 | speed up opencv image processing with OpenCL

本文首发于个人博客 https://kezunlin.me/post/59afd8b3/ ，欢迎阅读最新内容！ speed up opencv image processing with OpenCL  Guide OpenCL is a framework for writing programs that execute on these heterogenous platforms. The developers of an OpenCL library utilize all OpenCL compatible devices (CPUs, GPUs, DSPs, FPGAs etc) they find on a computer / device and assign the right tasks to the right processor. Keep in mind that as a user of OpenCV library you are not developing any OpenCL library. In fact you are not even a user of the OpenCL library because all the details are hidden behind the transparent API

What is the relationship between NVIDIA GPUs' CUDA cores and OpenCL computing units?

阅读更多关于 What is the relationship between NVIDIA GPUs' CUDA cores and OpenCL computing units?

My computer has a GeForce GTX 960M which is claimed by NVIDIA to have 640 CUDA cores. However, when I run clGetDeviceInfo to find out the number of computing units in my computer, it prints out 5 (see the figure below). It sounds like CUDA cores are somewhat different from what OpenCL considers as computing units? Or maybe a group of CUDA cores form an OpenCL computing unit? Can you explain this to me? Robert Crovella What is the relationship between NVIDIA GPUs' CUDA cores and OpenCL computing units? Your GTX 960M is a Maxwell device with 5 Streaming Multiprocessors, each with 128 CUDA cores,

DirectCompute versus OpenCL for GPU programming?

阅读更多关于 DirectCompute versus OpenCL for GPU programming?

I have some (financial) tasks which should map well to GPU computing, but I'm not really sure if I should go with OpenCL or DirectCompute. I did some GPU computing, but it was a long time ago (3 years). I did it through OpenGL since there was not really any alternative back then. I've seen some OpenCL presentations and it looks really nice. I haven't seen anything about DirectCompute yet, but I expect it to also be good. I'm not interested at the moment in cross-platform compatibility, and besides, I expect the two models to be similar enough to not cause a big headache when trying to go from

Best GPU algorithm for calculating lists of neighbours

阅读更多关于 Best GPU algorithm for calculating lists of neighbours

Given a collection of thousands of points in 3D, I need to get the list of neighbours for each particle that fall inside some cutoff value (in terms of euclidean distance), and if possible, sorted from nearest fo farthest. Which is the fastest GPU algorithm for this purpose in the CUDA or OpenCL languages? One of the fastest GPU MD codes I'm aware of, HALMD , uses a (highly tuned) version of the same sort of approach that is used in the CUDA SDK examples , "Particles". Both the HALMD paper and the Particles whitepaper are very clearly written. The underling algorithm is to assign particles

OpenCL not finding platforms?

阅读更多关于 OpenCL not finding platforms?

I am trying to utilize the C++ API for OpenCL. I have installed my NVIDIA drivers and I have tested that I can run the simple vector addition program provided here . I can compile this program with following gcc call and the program runs without problem. gcc main.c -o vectorAddition -l OpenCL -I/usr/local/cuda-6.5/include However, I would very much prefer to use the C++ API as opposed the very verbose host files needed for C. I downloaded the C++ bindings from Khronos from here and placed the cl.hpp file in the same location as my other cl.h file. The code uses some C++11 so I can compile the