opencl | 易学教程

Atomic operations with double, OpenCL

阅读更多关于 Atomic operations with double, OpenCL

问题 I would like to know if there's a way to implement atomic operations (particularly atomic_add) with double type. For floats this code works, but atomic_xchg doesn't support double: while ((value = atomic_xchg(addr, atomic_xchg(addr, 0.0f)+value))!=0.0f); 回答1: I was looking for for the same in the past and I found this: https://github.com/ddemidov/vexcl-experiments/blob/master/sort-by-key-atomic.cpp. At the end I figured out different approach to my problem so I did not use it. Here is the

Atomic operations with double, OpenCL

阅读更多关于 Atomic operations with double, OpenCL

Is there any way of making a particular thread to wait for other threads upon some condition in OpenCL kernel

阅读更多关于 Is there any way of making a particular thread to wait for other threads upon some condition in OpenCL kernel

问题 __kernel void example(__global int *a, __global int *dependency, uint cols) { int j = genter code hereet_global_id(0); int i = get_global_id(1); if(i > 0 && j > 0) { while(1) { test = 1; } //Wait for the dependents ----------------------------- -------------------------- } } In the above kernel code why the while loop is just skipped in all the threads with out infinitely looping. Any ideas on this. I'm working on some interesting problem which requires a thread to wait for some other threads

Struct Alignment with PyOpenCL

阅读更多关于 Struct Alignment with PyOpenCL

问题 update: the int4 in my kernel was wrong. I am using pyopencl but am unable to get struct alignment to work correctly. In the code below, which calls the kernel twice, the b value is returned correctly (as 1), but the c value has some "random" value. In other words: I am trying to read two members of a struct. I can read the first but not the second. Why? The same issue occurs whether I use numpy structured arrays or pack with struct. And the _-attribute__ settings in the comments don't help

Can I compile OpenCL code into ordinary, OpenCL-free binaries?

阅读更多关于 Can I compile OpenCL code into ordinary, OpenCL-free binaries?

问题 I am evaluating OpenCL for my purposes. It occurred to me that you can't assume it working out-of-the-box on either Windows or Mac because: Windows needs an OpenCL driver (which, of course, can be installed) MacOS supports OpenCL only on MacOS >= 10.6 So I'd have to code FPU/SSE/AVX code and OpenCL separately to produce two binaries: one without and one with OpenCL support. It would be much better, if I could compile OpenCL at compiletime into SSE/AVX and then ship a binary without OpenCL in

Can I compile OpenCL code into ordinary, OpenCL-free binaries?

阅读更多关于 Can I compile OpenCL code into ordinary, OpenCL-free binaries?

Local workgroup size = NULL OpenCL

阅读更多关于 Local workgroup size = NULL OpenCL

问题 Is there a way to get the used local workgroup size, when you set the value of the local workgroup size in the enqueueNDRangeKernel() function to NULL? 回答1: There is no standard runtime API for doing this in OpenCL. If you really need to know, you could have the kernel retrieve the work-group size with the get_local_size() function and store the value(s) to a buffer. The vendor profilers (AMD's CodeXL, Intel's VTune, NVIDIA's command-line profiler) should also tell you what they picked. 来源：

OpenCL command queue (CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE) not working (MacOS)

阅读更多关于 OpenCL command queue (CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE) not working (MacOS)

问题 Working through the examples and source code from Fixstars. Specifically, I'm trying the last bit of code in chapter 5 (two moving averages - aka Golden Cross): http://www.fixstars.com/en/opencl/book/OpenCLProgrammingBook/opencl-programming-practice/ The code is available here: http://www.fixstars.com/en/opencl/book/sample/ I'll post the specific example below. But the short of it is that by setting up the command queue as follows: command_queue = clCreateCommandQueue(context, device_id, CL

Using cl_float3 in parallel reduction example opencl

阅读更多关于 Using cl_float3 in parallel reduction example opencl

问题 I adapted the parallel reduction example for openCL for a bunch of floats. Now I wanted to expand the code to include cl_float3. So I want to find the minimum among a array of cl_float3. I thought it was a straight forward expansion from float to float3 in kernel. But I am receiving garbage values when i return from the kernel. Below is the kernel: __kernel void pmin3(__global float3 *src, __global float3 *gmin, __local float3 *lmin, __global float *dbg, uint nitems, uint dev) { uint count =

Using cl_float3 in parallel reduction example opencl

阅读更多关于 Using cl_float3 in parallel reduction example opencl