opencl

terminate called after throwing an instance of 'cl::sycl::detail::exception_implementation<(cl::sycl::detail::exception_types)9>'

ε祈祈猫儿з 提交于 2019-12-11 05:06:19
问题 I am newbie in SYCL/OpenCL/GPGPU. I am trying to build and run sample code of constant addition program , #include <iostream> #include <array> #include <algorithm> #include <CL/sycl.hpp> namespace sycl = cl::sycl; //<<Define ConstantAdder>> template<typename T, typename Acc, size_t N> class ConstantAdder { public: ConstantAdder(Acc accessor, T val) : accessor(accessor) , val(val) {} void operator() () { for (size_t i = 0; i < N; i++) { accessor[i] += val; } } private: Acc accessor; const T

Passing arguments through __local memory in OpenCL

青春壹個敷衍的年華 提交于 2019-12-11 04:02:28
问题 I am confused about the the __local memory in OpenCL here. I read some spec saying that the data flow has to be from Host to __Global, and then __Local. But I also see some kernel function like this: __kernel void foo(__local float * a) I was wondering how the data was transferred directly into the __local memory in this way? Thanks. 回答1: It is not possible to fill local buffer on the host side. Therefore you have to follow the flow host -> __global -> __local. Local buffer can be either

OpenCL compiler preprocessing definitions?

旧街凉风 提交于 2019-12-11 03:52:09
问题 I am developing OpenCL code on Snow Leopard and understand that the OpenCL just-in-time compilation is done by Clang/LLVM. Is the C preprocessor used at all? Is there a way to set preprocessing definitions with the compiler? What definitions exist? I would like the code to be aware of whether it is compiled for CPU or GPU so I for instance can use printf statements for debugging. 回答1: the clBuildProgram API takes compiler arguments (the const char * options parameter). -D MYMACRO is

OpenCL: Additional directories for header files

可紊 提交于 2019-12-11 03:47:56
问题 The OpenCL specification writes in 5.6.3 Build Options : 5.6.3.1 Preprocessor options ... -I dir Add the directory dir to the list of directories to be searched for header files. Can someone please explain what that means? As far as I know you cannot inlcude header files into your OpenCL kernels. So what could this options be for? EDIT: Link to the OpenCL spec: http://www.khronos.org/registry/cl/specs/opencl-1.1.pdf EDIT2: I was under the wrong assumption that it is not allowed to include

failing to initialize opencl vector literal

ε祈祈猫儿з 提交于 2019-12-11 03:39:51
问题 So i'm trying to initialize a variable in my opencl host code like this: cl_float2 es = (cl_float2)(0.0f,0.0f); Which, using Clang 2.9, fails with: source/solveEikonalEq.c:75:38: warning: expression result unused [-Wunused-value] cl_float2 es = (cl_float2)(0.0f,0.0f); ^~~~ source/solveEikonalEq.c:75:26: error: cast to union type from type 'float' not present in union cl_float2 es = (cl_float2)(0.0f,0.0f); //ray's tangent vector ^ ~~~~~~~~~~~ And, when using GCC 4.6.1, fails with: source

ATI OpenCL SDK on OSX

拥有回忆 提交于 2019-12-11 03:28:24
问题 I am owning new MPB with ATI-GK. I'am curios, whether i can download the sdk, special the example collection and profiler, for OSX or I have to run Windows/Linux nativelly, because i have found only versions for windows and linux? Thanks in advance. 回答1: As long as you have Mac OSX 10.6 or above (which you do if you have a new Macbook Pro), you already have OpenCL installed, under something like /Developer/GPU Computing/OpenCL. 来源: https://stackoverflow.com/questions/5794627/ati-opencl-sdk-on

OpenCL multiple command queue for Concurrent NDKernal Launch

筅森魡賤 提交于 2019-12-11 03:13:12
问题 I m trying to run an application of vector addition, where i need to launch multiple kernels concurrently, so for concurrent kernel launch someone in my last question advised me to use multiple command queues. which i m defining by an array context = clCreateContext(NULL, 1, &device_id, NULL, NULL, &err); for(i=0;i<num_ker;++i) { queue[i] = clCreateCommandQueue(context, device_id, 0, &err); } I m getting an error "command terminated by signal 11" some where around the above code. i m using

Limitations of work-item load in GPU? CUDA/OpenCL

只谈情不闲聊 提交于 2019-12-11 02:57:08
问题 I have a compute-intensive image algorithm that, for each pixel, needs to read many distant pixels. The distance is dependent on a constant defined at compile-time. My OpenCL algorithm performs well, but at a certain maximum distance - resulting in more heavy for loops - the driver seems to bail out. The screen goes black for a couple of seconds and then the command queue never finishes. A balloon message reveals that the driver is unhappy: "Display driver AMD driver stopped responding and

OpenCL producing incorrect calculations

て烟熏妆下的殇ゞ 提交于 2019-12-11 02:54:30
问题 I've been trying to use openCL to do some calculations, but the results are incorrect. I input three float3's that look like this: [300000,0,0] [300000,300000,0] [300000,300000,300000] into this kernel: __kernel void gravitate(__global const float3 *position,__global const float3 *momentum,__global const float3 *mass,__global float3 *newPosition,__global float3 *newMomentum,unsigned int numBodies,unsigned int seconds) { int gid=get_global_id(0); newPosition[gid]=position[gid]*2; newMomentum

OpenCL: Correct results on CPU not on GPU: how to manage memory correctly?

你。 提交于 2019-12-11 02:54:17
问题 __kernel void CKmix(__global short* MCL, __global short* MPCL,__global short *C, int S, int B) { unsigned int i=get_global_id(0); unsigned int ii=get_global_id(1); MCL[i]+=MPCL[B*ii+i+C[ii]+S]; } Kernel seams ok, it compiles successfully, and I have obtained the correct results using the CPU as a device, but that was when I had the program release and and recreate my memory objects each time the kernel is called, which for my testing purpose is about 16000 times. The code I am posting is