opencl | 易学教程

Optimal workgroup size for sum reduction in OpenCL

阅读更多关于 Optimal workgroup size for sum reduction in OpenCL

问题 I am using the following kernel for sum reduciton. __kernel void reduce(__global float* input, __global float* output, __local float* sdata) { // load shared mem unsigned int tid = get_local_id(0); unsigned int bid = get_group_id(0); unsigned int gid = get_global_id(0); unsigned int localSize = get_local_size(0); unsigned int stride = gid * 2; sdata[tid] = input[stride] + input[stride + 1]; barrier(CLK_LOCAL_MEM_FENCE); // do reduction in shared mem for(unsigned int s = localSize >> 2; s > 0;

Atomic max for floats in OpenCL

阅读更多关于 Atomic max for floats in OpenCL

问题 I need an atomic max function for floats in OpenCL. This is my current naive code using atomic_xchg float value = data[index]; if ( value > *max_value ) { atomic_xchg(max_value, value); } This code gives the correct result when using an Intel CPU, but not for a Nvidia GPU. Is this code correct, or can anyone help me? 回答1: You can do it like this: //Function to perform the atomic max inline void AtomicMax(volatile __global float *source, const float operand) { union { unsigned int intVal;

OpenCL kernel arguments

阅读更多关于 OpenCL kernel arguments

问题 I've just started fiddling around with OpenCL and I've come across a problem: I do not know how to pass complex data structures as arguments. I'm using LWJGL's OpenCL binding, and the example provided in the wiki http://lwjgl.org/wiki/index.php?title=Sum_Example. In that example 2 float buffers are created and passed as arguments (LWGJL provides methods in a class named BufferUtils for creating these buffers). Now, how would I create a buffer of points, typedef struct {int x, int y} tpoint ,

OpenCL clGetPlatformIDs exception

阅读更多关于 OpenCL clGetPlatformIDs exception

问题 I use the HelloWorld example from the samples that came with the installation of this package AMD PACKAGE The problem is that I can't run any example because of an error. cl_uint numPlatforms; //the NO. of platforms cl_platform_id platform = NULL; //the chosen platform cl_int status = clGetPlatformIDs(0, NULL, &numPlatforms); The following block of code produces an error. Status is set to -858993460 at the end of this statement. An exception is thrown saying "Unhandled exception at 0x7429C9F5

OpenCL built-in function 'select'

阅读更多关于 OpenCL built-in function 'select'

问题 It's not clear for me what is a purpose of built-in OpenCL function select . Can somebody, please, clarify? From OpenCL specification: function select(gentype a, gentype b, igentype c) returns: for each component of a vector type, result[i] = if MSB of c[i] is set ? b[i] : a[i]. What is a MSB in this case? I know that MSB stands for most significant bit , but I have no idea how it's related to this case. 回答1: OpenCL select is to select elements from a pair of vectors (a, b) , based on the

Threading opencl compiling

阅读更多关于 Threading opencl compiling

问题 [Update:] I'm spawning multiple processes now and it works fairly well, though the basic threading problem still exists. [/] I'm trying to thread a c++ (g++ 4.6.1) program that compiles a bunch of opencl kernels. Most of the time taken is spent inside clBuildProgram. (It's genetic programming and actually running the code and evaluating fitness is much much faster.) I'm trying to thread the compilation of these kernels and not having any luck so far. At this point, there's no shared data

Threading opencl compiling

阅读更多关于 Threading opencl compiling

OpenCL synchronization between work-groups

阅读更多关于 OpenCL synchronization between work-groups

问题 Is it possible to synchronize OpenCL work-groups? For example, I have 100 work-groups every work-groups have only one item (don't ask me why, this is an example), and I need to put barrier to every work-item which ensure that all work-groups will be continue after every work-item in this 100 work-groups reaches this barrier point. 回答1: No, you can't. You can synchronize threads inside a group, and you can synchronize kernel executions inside a command queue. You may be able to synchronize a

Does any OpenCL host have more than one platform?

阅读更多关于 Does any OpenCL host have more than one platform?

问题 The definition of a platform in Khronos' OpenCL 1.0 and 1.1 specification: Platform: The host plus a collection of devices managed by the OpenCL framework that allow an application to share resources and execute kernels on devices in the platform. The OpenCL function clGetPlatformIDs creates an array of platforms, implying that multiple platforms are possible. Is it safe to assume that a given OpenCL host has only one platform? In other words, will I lose anything on any host by doing this:

How to read UMat from a file in opencv 3.0 Beta?

阅读更多关于 How to read UMat from a file in opencv 3.0 Beta?

问题 I want to use UMat so my code can be run on both GPU and CPU using OpenCL (OpenCV 3.0.0 Beta). but I can not find a way to read an image file into a UMat or convert a Mat to UMat . How can I read an image into a UMat ? 回答1: Sample for Mat to UMat conversion is below. Coudlnt' find documentation for this. So only option was to read the source. UMat img = imread( "lena.jpg", IMREAD_COLOR ).getUMat( ACCESS_READ ); Different access flags as in source are ACCESS_READ, ACCESS_WRITE, ACCESS_RW,