opencl | 易学教程

Strategy for doing final reduction

阅读更多关于 Strategy for doing final reduction

问题 I am trying to implement an OpenCL version for doing reduction of a array of float. To achieve it, I took the following code snippet found on the web : __kernel void sumGPU ( __global const double *input, __global double *partialSums, __local double *localSums) { uint local_id = get_local_id(0); uint group_size = get_local_size(0); // Copy from global memory to local memory localSums[local_id] = input[get_global_id(0)]; // Loop for computing localSums for (uint stride = group_size/2; stride>0

OpenCL reduction result wrong with large floats

阅读更多关于 OpenCL reduction result wrong with large floats

问题 I used AMD's two-stage reduction example to compute the sum of all numbers from 0 to 65 536 using floating point precision. Unfortunately, the result is not correct. However, when I modify my code, so that I compute the sum of 65 536 smaller numbers (for example 1), the result is correct. I couldn't find any error in the code. Is it possible that I am getting wrong results, because of the float type? If this is the case, what is the best approach to solve the issue? 回答1: There is probably no

OpenCL float sum reduction

阅读更多关于 OpenCL float sum reduction

问题 I would like to apply a reduce on this piece of my kernel code (1 dimensional data): __local float sum = 0; int i; for(i = 0; i < length; i++) sum += //some operation depending on i here; Instead of having just 1 thread that performs this operation, I would like to have n threads (with n = length) and at the end having 1 thread to make the total sum. In pseudo code, I would like to able to write something like this: int i = get_global_id(0); __local float sum = 0; sum += //some operation

OpenCL float sum reduction

阅读更多关于 OpenCL float sum reduction

Passing struct with pointer members to OpenCL kernel using PyOpenCL

阅读更多关于 Passing struct with pointer members to OpenCL kernel using PyOpenCL

问题 Let's suppose I have a kernel to compute the element-wise sum of two arrays. Rather than passing a, b, and c as three parameters, I make them structure members as follows: typedef struct { __global uint *a; __global uint *b; __global uint *c; } SumParameters; __kernel void compute_sum(__global SumParameters *params) { uint id = get_global_id(0); params->c[id] = params->a[id] + params->b[id]; return; } There is information on structures if you RTFM of PyOpenCL [1], and others have addressed

Passing struct to GPU with OpenCL that contains an array of floats

阅读更多关于 Passing struct to GPU with OpenCL that contains an array of floats

问题 I currently have some data that I would like to pass to my GPU and the multiply it by 2. I have created a struct which can be seen here: struct GPUPatternData { cl_int nInput,nOutput,patternCount, offest; cl_float* patterns; }; This struct should contain an array of floats. The array of floats I will not know untill run time as it is specified by the user. The host code: typedef struct GPUPatternDataContatiner { int nodeInput,nodeOutput,patternCount, offest; float* patterns; } GPUPatternData;

OpenCL: Store pointer to global memory in local memory?

阅读更多关于 OpenCL: Store pointer to global memory in local memory?

问题 any solutions? Is that even possible? __global *float abc; // pointer to global memory stored in private memory I want abc to be stored in local memory instead of private memory. 回答1: I think this is clarified here List 5.2: __global int global_data[128]; // 128 integers allocated on global memory __local float *lf; // pointer placed on the private memory, which points to a single-precision float located on the local memory __global char * __local lgc[8]; // 8 pointers stored on the local

GCC: Compiling an OpenCL host on Windows

阅读更多关于 GCC: Compiling an OpenCL host on Windows

问题 I just wanted to try out using OpenCL under Windows. Abstract : I got an " undefined reference to " error when I tried to compile (using the command gcc my.o -o my.exe -L "C:\Program Files (x86)\AMD APP\lib\x86_64" -l OpenCL ). My Code #include <CL/cl.h> #include <stdio.h> int main(void) { cl_platform_id platform; int err; err = clGetPlatformIDs(1, &platform, NULL); if(err < 0) { perror("There's No Platform!"); exit(1); } /* Some more code... */ system("PAUSE"); } Makefile all: addition

C++ Template preprocessor tool

阅读更多关于 C++ Template preprocessor tool

问题 Is there a compiler or standalone preprocessor which takes C++ files and runs a template expansion pass, generating new C++ code with expanded template instantiations? I remember such a tool in the mid-90s when templates were still new and experimental, and the preprocessor was a way to do template programming with compilers without native template support. This is a lot more complicated than a macro-processing step since it would likely require parsing and tokenizing the code to understand

OpenCL distribution

阅读更多关于 OpenCL distribution

问题 I'm currently developing an OpenCL-application for a very heterogeneous set of computers (using JavaCL to be specific). In order to maximize performance I want to use a GPU if it's available otherwise I want to fall back to the CPU and use SIMD-instructions. My plan is to implement the OpenCL-code using vector-types because my understanding is that this allows CPUs to vectorize the instructions and use SIMD-instructions. My question however is regarding which OpenCL-implementation to use. E.g