opencl

clBuildProgram failed with error code -11 and without build log

♀尐吖头ヾ 提交于 2019-12-01 03:45:56
I have worked little bit in OpenCL now but recently "clBuildProgram" failed in one of my program. My code excerpt is below: cl_program program; program = clCreateProgramWithSource(context, 1, (const char**) &kernel_string, NULL, &err); if(err != CL_SUCCESS) { cout<<"Unable to create Program Object. Error code = "<<err<<endl; exit(1); } if(clBuildProgram(program, 0, NULL, NULL, NULL, NULL) != CL_SUCCESS) { cout<<"Program Build failed\n"; size_t length; char buffer[2048]; clGetProgramBuildInfo(program, device_id[0], CL_PROGRAM_BUILD_LOG, sizeof(buffer), buffer, &length); cout<<"--- Build log ---

getting “pygpu was configured but could not be imported” error while trying with OpenCL+Theano on AMD Radeon

时光毁灭记忆、已成空白 提交于 2019-12-01 03:32:32
I have followed the instructions from this: https://gist.github.com/jarutis/ff28bca8cfb9ce0c8b1a But then when I tried : THEANO_FLAGS=device=opencl0:0 python test.py on the test file I am getting error: ERROR (theano.sandbox.gpuarray): pygpu was configured but could not be imported Traceback (most recent call last): File "/home/mesayantan/.local/lib/python2.7/site-packages/theano/sandbox/gpuarray/ init .py", line 20, in import pygpu File "/usr/src/gtest/clBLAS/build/libgpuarray/pygpu/ init .py", line 7, in from . import gpuarray, elemwise, reduction File "/usr/src/gtest/clBLAS/build

variable length array declaration not allowed in OpenCL - why?

北城余情 提交于 2019-12-01 02:22:14
问题 I want to create a local array inside my OpenCL kernel, whose size depends on a parameter of the kernel. It seems that's not allowed - at least with AMD APP. Is your experience different? Perhaps it's just the APP? Or is is there some rationale here? Edit: I would now suggest variable length arrays should be allowed in CPU-side code too, and it was an unfortunate call by the C standard committee; but the question stands. 回答1: You can dynamically allocate the size of a local block. You need to

clBuildProgram failed with error code -11 and without build log

大兔子大兔子 提交于 2019-12-01 01:32:35
问题 I have worked little bit in OpenCL now but recently "clBuildProgram" failed in one of my program. My code excerpt is below: cl_program program; program = clCreateProgramWithSource(context, 1, (const char**) &kernel_string, NULL, &err); if(err != CL_SUCCESS) { cout<<"Unable to create Program Object. Error code = "<<err<<endl; exit(1); } if(clBuildProgram(program, 0, NULL, NULL, NULL, NULL) != CL_SUCCESS) { cout<<"Program Build failed\n"; size_t length; char buffer[2048]; clGetProgramBuildInfo

getting “pygpu was configured but could not be imported” error while trying with OpenCL+Theano on AMD Radeon

寵の児 提交于 2019-12-01 00:04:09
问题 I have followed the instructions from this: https://gist.github.com/jarutis/ff28bca8cfb9ce0c8b1a But then when I tried : THEANO_FLAGS=device=opencl0:0 python test.py on the test file I am getting error: ERROR (theano.sandbox.gpuarray): pygpu was configured but could not be imported Traceback (most recent call last): File "/home/mesayantan/.local/lib/python2.7/site-packages/theano/sandbox/gpuarray/ init .py", line 20, in import pygpu File "/usr/src/gtest/clBLAS/build/libgpuarray/pygpu/ init

Passing struct with pointer members to OpenCL kernel using PyOpenCL

拥有回忆 提交于 2019-11-30 23:38:57
Let's suppose I have a kernel to compute the element-wise sum of two arrays. Rather than passing a, b, and c as three parameters, I make them structure members as follows: typedef struct { __global uint *a; __global uint *b; __global uint *c; } SumParameters; __kernel void compute_sum(__global SumParameters *params) { uint id = get_global_id(0); params->c[id] = params->a[id] + params->b[id]; return; } There is information on structures if you RTFM of PyOpenCL [1], and others have addressed this question too [2] [3] [4]. But none of the OpenCL struct examples I've been able to find have

Advice for real time image processing

烂漫一生 提交于 2019-11-30 20:57:58
really need some help and advice as I'm new with real time image processing. I am trying to implement an algorithm for a system which the camera get 1000fps, and I need to get the value of each pixel in all images and do the different calculation on the evolution of pixel[i][j] in N number of images, for all the pixels in the images. I have the (unsigned char *ptr) I want to transfer them to the GPU and start implementing the algorithm using CUDA and return the data back to CPU. but I am not sure what would be the best option for realtime processing. my system : CPU Intel Xeon x5660 2.8Ghz(2

Better way to load vectors from memory. (clang)

筅森魡賤 提交于 2019-11-30 20:03:39
I'm writing a test program to get used to Clang's language extensions for OpenCL style vectors. I can get the code to work but I'm having issues getting one aspect of it down. I can't seem to figure out how to get clang to just load in a vector from a scalar array nicely. At the moment I have to do something like: byte16 va = (byte16){ argv[1][start], argv[1][start + 1], argv[1][start + 2], argv[1][start + 3], argv[1][start + 4], argv[1][start + 5], argv[1][start + 6], argv[1][start + 7], argv[1][start + 8], argv[1][start + 9], argv[1][start + 10], argv[1][start + 11], argv[1][start + 12],

Position of compiler flag -l

江枫思渺然 提交于 2019-11-30 18:57:57
问题 I'm currently learning OpenCL. Now, when I want to compile my program, I get an error with this command: g++ -Wall -l OpenCL main.cpp -o main The errors are mostly undefined references, because the library is not linked, I think (nevertheless I will post the error code at the end). But with this command everything works fine: g++ -Wall main.cpp -o main -l OpenCL So my question is, what do I have to do, to use the -l Flag in front of the command? (The Background is: I want to use Netbeans to

OpenCL - Multiple GPU Buffer Synchronization

孤人 提交于 2019-11-30 18:41:33
问题 I have an OpenCL kernel that calculates total force on a particle exerted by other particles in the system, and then another one that integrates the particle position/velocity. I would like to parallelize these kernels across multiple GPUs, basically assigning some amount of particles to each GPU. However, I have to run this kernel multiple times, and the result from each GPU is used on every other. Let me explain that a little further: Say you have particle 0 on GPU 0, and particle 1 on GPU