opencl

Optimal workgroup size for sum reduction in OpenCL

╄→гoц情女王★ 提交于 2019-12-24 00:38:25
问题 I am using the following kernel for sum reduciton. __kernel void reduce(__global float* input, __global float* output, __local float* sdata) { // load shared mem unsigned int tid = get_local_id(0); unsigned int bid = get_group_id(0); unsigned int gid = get_global_id(0); unsigned int localSize = get_local_size(0); unsigned int stride = gid * 2; sdata[tid] = input[stride] + input[stride + 1]; barrier(CLK_LOCAL_MEM_FENCE); // do reduction in shared mem for(unsigned int s = localSize >> 2; s > 0;

Atomic max for floats in OpenCL

不打扰是莪最后的温柔 提交于 2019-12-23 23:54:04
问题 I need an atomic max function for floats in OpenCL. This is my current naive code using atomic_xchg float value = data[index]; if ( value > *max_value ) { atomic_xchg(max_value, value); } This code gives the correct result when using an Intel CPU, but not for a Nvidia GPU. Is this code correct, or can anyone help me? 回答1: You can do it like this: //Function to perform the atomic max inline void AtomicMax(volatile __global float *source, const float operand) { union { unsigned int intVal;

OpenCL kernel arguments

半城伤御伤魂 提交于 2019-12-23 22:18:54
问题 I've just started fiddling around with OpenCL and I've come across a problem: I do not know how to pass complex data structures as arguments. I'm using LWJGL's OpenCL binding, and the example provided in the wiki http://lwjgl.org/wiki/index.php?title=Sum_Example. In that example 2 float buffers are created and passed as arguments (LWGJL provides methods in a class named BufferUtils for creating these buffers). Now, how would I create a buffer of points, typedef struct {int x, int y} tpoint ,

OpenCL clGetPlatformIDs exception

╄→гoц情女王★ 提交于 2019-12-23 22:00:06
问题 I use the HelloWorld example from the samples that came with the installation of this package AMD PACKAGE The problem is that I can't run any example because of an error. cl_uint numPlatforms; //the NO. of platforms cl_platform_id platform = NULL; //the chosen platform cl_int status = clGetPlatformIDs(0, NULL, &numPlatforms); The following block of code produces an error. Status is set to -858993460 at the end of this statement. An exception is thrown saying "Unhandled exception at 0x7429C9F5

OpenCL built-in function 'select'

雨燕双飞 提交于 2019-12-23 15:05:59
问题 It's not clear for me what is a purpose of built-in OpenCL function select . Can somebody, please, clarify? From OpenCL specification: function select(gentype a, gentype b, igentype c) returns: for each component of a vector type, result[i] = if MSB of c[i] is set ? b[i] : a[i]. What is a MSB in this case? I know that MSB stands for most significant bit , but I have no idea how it's related to this case. 回答1: OpenCL select is to select elements from a pair of vectors (a, b) , based on the

Threading opencl compiling

﹥>﹥吖頭↗ 提交于 2019-12-23 08:36:10
问题 [Update:] I'm spawning multiple processes now and it works fairly well, though the basic threading problem still exists. [/] I'm trying to thread a c++ (g++ 4.6.1) program that compiles a bunch of opencl kernels. Most of the time taken is spent inside clBuildProgram. (It's genetic programming and actually running the code and evaluating fitness is much much faster.) I'm trying to thread the compilation of these kernels and not having any luck so far. At this point, there's no shared data

Threading opencl compiling

落爺英雄遲暮 提交于 2019-12-23 08:36:01
问题 [Update:] I'm spawning multiple processes now and it works fairly well, though the basic threading problem still exists. [/] I'm trying to thread a c++ (g++ 4.6.1) program that compiles a bunch of opencl kernels. Most of the time taken is spent inside clBuildProgram. (It's genetic programming and actually running the code and evaluating fitness is much much faster.) I'm trying to thread the compilation of these kernels and not having any luck so far. At this point, there's no shared data

OpenCL synchronization between work-groups

。_饼干妹妹 提交于 2019-12-23 07:38:29
问题 Is it possible to synchronize OpenCL work-groups? For example, I have 100 work-groups every work-groups have only one item (don't ask me why, this is an example), and I need to put barrier to every work-item which ensure that all work-groups will be continue after every work-item in this 100 work-groups reaches this barrier point. 回答1: No, you can't. You can synchronize threads inside a group, and you can synchronize kernel executions inside a command queue. You may be able to synchronize a

Does any OpenCL host have more than one platform?

流过昼夜 提交于 2019-12-23 07:36:36
问题 The definition of a platform in Khronos' OpenCL 1.0 and 1.1 specification: Platform: The host plus a collection of devices managed by the OpenCL framework that allow an application to share resources and execute kernels on devices in the platform. The OpenCL function clGetPlatformIDs creates an array of platforms, implying that multiple platforms are possible. Is it safe to assume that a given OpenCL host has only one platform? In other words, will I lose anything on any host by doing this:

How to read UMat from a file in opencv 3.0 Beta?

ぐ巨炮叔叔 提交于 2019-12-23 07:08:20
问题 I want to use UMat so my code can be run on both GPU and CPU using OpenCL (OpenCV 3.0.0 Beta). but I can not find a way to read an image file into a UMat or convert a Mat to UMat . How can I read an image into a UMat ? 回答1: Sample for Mat to UMat conversion is below. Coudlnt' find documentation for this. So only option was to read the source. UMat img = imread( "lena.jpg", IMREAD_COLOR ).getUMat( ACCESS_READ ); Different access flags as in source are ACCESS_READ, ACCESS_WRITE, ACCESS_RW,