opencl

Atomic operations with double, OpenCL

拈花ヽ惹草 提交于 2020-01-06 16:22:12
问题 I would like to know if there's a way to implement atomic operations (particularly atomic_add) with double type. For floats this code works, but atomic_xchg doesn't support double: while ((value = atomic_xchg(addr, atomic_xchg(addr, 0.0f)+value))!=0.0f); 回答1: I was looking for for the same in the past and I found this: https://github.com/ddemidov/vexcl-experiments/blob/master/sort-by-key-atomic.cpp. At the end I figured out different approach to my problem so I did not use it. Here is the

Atomic operations with double, OpenCL

六月ゝ 毕业季﹏ 提交于 2020-01-06 16:19:16
问题 I would like to know if there's a way to implement atomic operations (particularly atomic_add) with double type. For floats this code works, but atomic_xchg doesn't support double: while ((value = atomic_xchg(addr, atomic_xchg(addr, 0.0f)+value))!=0.0f); 回答1: I was looking for for the same in the past and I found this: https://github.com/ddemidov/vexcl-experiments/blob/master/sort-by-key-atomic.cpp. At the end I figured out different approach to my problem so I did not use it. Here is the

Is there any way of making a particular thread to wait for other threads upon some condition in OpenCL kernel

断了今生、忘了曾经 提交于 2020-01-06 14:33:49
问题 __kernel void example(__global int *a, __global int *dependency, uint cols) { int j = genter code hereet_global_id(0); int i = get_global_id(1); if(i > 0 && j > 0) { while(1) { test = 1; } //Wait for the dependents ----------------------------- -------------------------- } } In the above kernel code why the while loop is just skipped in all the threads with out infinitely looping. Any ideas on this. I'm working on some interesting problem which requires a thread to wait for some other threads

Struct Alignment with PyOpenCL

狂风中的少年 提交于 2020-01-06 08:13:24
问题 update: the int4 in my kernel was wrong. I am using pyopencl but am unable to get struct alignment to work correctly. In the code below, which calls the kernel twice, the b value is returned correctly (as 1), but the c value has some "random" value. In other words: I am trying to read two members of a struct. I can read the first but not the second. Why? The same issue occurs whether I use numpy structured arrays or pack with struct. And the _-attribute__ settings in the comments don't help

Can I compile OpenCL code into ordinary, OpenCL-free binaries?

痴心易碎 提交于 2020-01-05 15:19:14
问题 I am evaluating OpenCL for my purposes. It occurred to me that you can't assume it working out-of-the-box on either Windows or Mac because: Windows needs an OpenCL driver (which, of course, can be installed) MacOS supports OpenCL only on MacOS >= 10.6 So I'd have to code FPU/SSE/AVX code and OpenCL separately to produce two binaries: one without and one with OpenCL support. It would be much better, if I could compile OpenCL at compiletime into SSE/AVX and then ship a binary without OpenCL in

Can I compile OpenCL code into ordinary, OpenCL-free binaries?

╄→尐↘猪︶ㄣ 提交于 2020-01-05 15:18:46
问题 I am evaluating OpenCL for my purposes. It occurred to me that you can't assume it working out-of-the-box on either Windows or Mac because: Windows needs an OpenCL driver (which, of course, can be installed) MacOS supports OpenCL only on MacOS >= 10.6 So I'd have to code FPU/SSE/AVX code and OpenCL separately to produce two binaries: one without and one with OpenCL support. It would be much better, if I could compile OpenCL at compiletime into SSE/AVX and then ship a binary without OpenCL in

Local workgroup size = NULL OpenCL

▼魔方 西西 提交于 2020-01-05 10:33:34
问题 Is there a way to get the used local workgroup size, when you set the value of the local workgroup size in the enqueueNDRangeKernel() function to NULL? 回答1: There is no standard runtime API for doing this in OpenCL. If you really need to know, you could have the kernel retrieve the work-group size with the get_local_size() function and store the value(s) to a buffer. The vendor profilers (AMD's CodeXL, Intel's VTune, NVIDIA's command-line profiler) should also tell you what they picked. 来源:

OpenCL command queue (CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE) not working (MacOS)

三世轮回 提交于 2020-01-05 08:13:31
问题 Working through the examples and source code from Fixstars. Specifically, I'm trying the last bit of code in chapter 5 (two moving averages - aka Golden Cross): http://www.fixstars.com/en/opencl/book/OpenCLProgrammingBook/opencl-programming-practice/ The code is available here: http://www.fixstars.com/en/opencl/book/sample/ I'll post the specific example below. But the short of it is that by setting up the command queue as follows: command_queue = clCreateCommandQueue(context, device_id, CL

Using cl_float3 in parallel reduction example opencl

对着背影说爱祢 提交于 2020-01-05 07:57:31
问题 I adapted the parallel reduction example for openCL for a bunch of floats. Now I wanted to expand the code to include cl_float3. So I want to find the minimum among a array of cl_float3. I thought it was a straight forward expansion from float to float3 in kernel. But I am receiving garbage values when i return from the kernel. Below is the kernel: __kernel void pmin3(__global float3 *src, __global float3 *gmin, __local float3 *lmin, __global float *dbg, uint nitems, uint dev) { uint count =

Using cl_float3 in parallel reduction example opencl

青春壹個敷衍的年華 提交于 2020-01-05 07:57:10
问题 I adapted the parallel reduction example for openCL for a bunch of floats. Now I wanted to expand the code to include cl_float3. So I want to find the minimum among a array of cl_float3. I thought it was a straight forward expansion from float to float3 in kernel. But I am receiving garbage values when i return from the kernel. Below is the kernel: __kernel void pmin3(__global float3 *src, __global float3 *gmin, __local float3 *lmin, __global float *dbg, uint nitems, uint dev) { uint count =