opencl

What is the difference between creating a buffer object with clCreateBuffer + CL_MEM_COPY_HOST_PTR vs. clCreateBuffer + clEnqueueWriteBuffer?

我与影子孤独终老i 提交于 2019-11-29 05:37:37
I have seen both versions in tutorials, but I could not find out, what their advantages and disadvantages are. Which one is the proper one? cl_mem input = clCreateBuffer(context,CL_MEM_READ_ONLY,sizeof(float) * DATA_SIZE, NULL, NULL); clEnqueueWriteBuffer(command_queue, input, CL_TRUE, 0, sizeof(float) * DATA_SIZE, inputdata, 0, NULL, NULL); vs. cl_mem input = clCreateBuffer(context,CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, ,sizeof(float) * DATA_SIZE, inputdata, NULL); Thanks. [Update] I added CL_MEM_COPY_HOST_PTR, to the second example to make it correct. I assume that inputdata is not NULL.

How can I change the device on wich OpenCL-code will be executed with Umat in OpenCV?

邮差的信 提交于 2019-11-29 05:22:33
As known, OpenCV 3.0 supports new class cv::Umat which provides Transparent API (TAPI) to use OpenCL automaticaly if it can: http://code.opencv.org/projects/opencv/wiki/Opencv3#tapi There are two indtroductions to the cv::Umat and TAPI: Intel: https://software.intel.com/en-us/articles/opencv-30-architecture-guide-for-intel-inde-opencv AMD: http://developer.amd.com/community/blog/2014/10/15/opencv-3-0-transparent-api-opencl-acceleration/ But if I have: Intel CPU Core i5 (Haswell) 4xCores (OpenCL Intel CPUs with SSE 4.1, SSE 4.2 or AVX support ) Intel Integrated HD Graphics which supports OpenCL

error code (-11):: what are all possible reasons of getting error “cl_build_program_failure” in OpenCL?

非 Y 不嫁゛ 提交于 2019-11-29 03:30:47
I am using ATI RV770 graphics card, OpenCl 1.0 and ati-stream-sdk-v2.3-lnx64 on linux. While running my host code which includes following two sections to build kernel program, i am getting error code (-11) i.e. cl_build_program_failure . Does it means that kernel program compiled, if not then how is it compiled and debugged? const char* KernelPath = "abc_kernel.cl"; //kernel program is in separate file but in same directory of host code.. / * Create Program object from the kernel source * ** * ** * / char* sProgramSource = readKernelSource(KernelPath); size_t sourceSize = strlen

Persistent threads in OpenCL and CUDA

血红的双手。 提交于 2019-11-29 03:21:44
问题 I have read some papers talking about "persistent threads" for GPGPU, but I don't really understand it. Can any one give me an example or show me the use of this programming fashion? What I keep in my mind after reading and googling "persistent threads": Presistent Threads it's no more than a while loop that keep thread running and computing a lot of bunch of works. Is this correct? Thanks in advance Reference: http://www.idav.ucdavis.edu/publications/print_pub?pub_id=1089 http://developer

How to pass and access C++ vectors to OpenCL kernel?

我只是一个虾纸丫 提交于 2019-11-29 02:54:22
问题 I'm new to C, C++ and OpenCL and doing my best to learn them at the moment. Here's a preexisting C++ function that I'm trying to figure out how to port to OpenCL using either the C or C++ bindings. #include <vector> using namespace std; class Test { private: double a; vector<double> b; vector<long> c; vector<vector<double> > d; public: double foo(long x, double y) { // mathematical operations // using x, y, a, b, c, d // and also b.size() // to calculate return value return 0.0; } }; Broadly

OpenCL: Work items, Processing elements, NDRange

自作多情 提交于 2019-11-29 02:50:11
问题 My classmates and me are being confronted with OpenCL for the first time. As expected, we ran into some issues. Below I summarized the issues we had and the answers we found. However, we're not sure that we got it all right, so it would be great if you guys could take a look at both our answers and the questions below them. Why didn't we split that up into single questions? They partly relate to each other. We think these are typical beginner's questions. Those fellow students who we

ERROR: clGetPlatformIDs -1001 when running OpenCL code (Linux)

百般思念 提交于 2019-11-29 02:28:26
After finally managing to get my code to compile with OpenCL, I cannot seem to get the output binary to run! This is on my linux laptop running Kubuntu 13.10 x64 The error I get is (Printed from cl::Error): ERROR: clGetPlatformIDs -1001 I found this post but there does not seem to be a clear solution. I added myself to the video group but this does not seem to work. With regards to the ICD profile... I am not sure what I need to do - shouldn't this be included with the cuda toolkit? If not, where could I download one? EDIT : It seems I have an ICD file in my system under /usr/share/nvidia-331

Is private memory slower than local memory?

放肆的年华 提交于 2019-11-29 01:35:33
I was working on a kernel which had much global memory access per thread so I copied them to local memory which gave a speed up of 40%. I wanted still more speed up so copied from local to private which degraded the performance So is it correct that I think we must not use to much private memory which may degrade the performance? Ashwin's answer is in the right direction but a little misleading. OpenCL abstracts the address space of variables away from their physical storage, and there is not necessarily a 1:1 mapping between the two. Consider OpenCL variables declared in the __private address

OpenCL user defined inline functions

試著忘記壹切 提交于 2019-11-29 01:00:27
问题 Is it possible to define my own functions in OpenCL code, in order that the kernels could call them? It yes, where can I see some simple example? 回答1: Function used to create program is ... cl_program clCreateProgramWithSource ( cl_context context, cl_uint count, const char **strings, const size_t *lengths, cl_int *errcode_ret) You can place functions inside the strings parameter like this, float AddVector(float a, float b) { return a + b; } kernel void VectorAdd( global read_only float* a,

Access vector type OpenCL

泪湿孤枕 提交于 2019-11-28 23:59:40
I have a variable whithin a kernel like: int16 element; I would like to know if there is a way to adress the third int in element like element[2] so that i would be as same as writing element.s2 So how can i do something like: int16 element; int vector[100] = rand() % 16; for ( int i=0; i<100; i++ ) element[ vector[i] ]++; The way i did was: int temp[16] = {0}; int16 element; int vector[100] = rand() % 16; for ( int i=0; i<100; i++ ) temp[ vector[i] ]++; element = (int16)(temp[0],temp[1],temp[2],temp[3],temp[4],temp[5],temp[6],temp[7],temp[8],temp[9],temp[10],temp[11],temp[12],temp[13],temp[14