opencl | 易学教程

OpenCL struct values correct on CPU but not on GPU

阅读更多关于 OpenCL struct values correct on CPU but not on GPU

问题 I do have a struct in a file wich is included by the host code and the kernel typedef struct { float x, y, z, dir_x, dir_y, dir_z; int radius; } WorklistStruct; I'm building this struct in my c++ host code and passing it via a buffer to the OpenCL kernel. If I'm choosing an CPU device for computation I will get the following result: printf ( "item:[%f,%f,%f][%f,%f,%f]%d,%d\n", item.x, item.y, item.z, item.dir_x, item.dir_y, item.dir_z , item.radius ,sizeof(float)); Host: item:[20.169043,7

CMAke can not find opencl sdk by NVIDA

阅读更多关于 CMAke can not find opencl sdk by NVIDA

问题 I just installed NVIDA CUDA tool kit to use it for developing OpenCL application on windows 8.1. I came accress some problems: 1- FinedOpenCl.cmake doesn't work since opencl_dir is not set by nvida tool kit. cmake file is: FIND_PACKAGE(OpenCL REQUIRED) INCLUDE_DIRECTORIES(${OPENCL_INCLUDE_DIR}) and cmake error is: CMake Error at C:/Program Files (x86)/CMake/share/cmake-3.1/Modules/FindPackageHandleStandardArgs.cmake:138 (message): Could NOT find OpenCL (missing: OPENCL_LIBRARY OPENCL_INCLUDE

OpenCL timeout on beignet doesnt raise error?

阅读更多关于 OpenCL timeout on beignet doesnt raise error?

问题 I run the following (simplified) code, which runs a simplified kernel for a few seconds, and then checks the results. The first 400,000 or so results are correct, and then the next are all zero. The kernel should put the same value (4228) into each element of the output array of 4.5 million elements. It looks like somehow, somewhere, something is timing out, or not being synchronized, but I'm a bit puzzled, since I: even called clFinish, just to make sure am checking all errors, and no errors

VexCL vector of structs?

阅读更多关于 VexCL vector of structs?

问题 So I know that it is possible to use custom types with OpenCL. But I haven't been able to use them with VexCL. Creating a device vector of structs works fine, but I can't perform any operations. As I haven't found any examples using custom types with VexCL my question is is that even possible? Thanks in advance. 回答1: VexCL does not support operations with vectors of structs out of the box. You will need to help it a bit. First, you need to tell VexCL how to spell the type name of the struct.

GPU Memory bandwidth theoretical vs practical

阅读更多关于 GPU Memory bandwidth theoretical vs practical

问题 As part of an algorithm profiling running on GPU I feel that I'm hitting the memory bandwidth. I have several complex kernels performing some complicated operations (sparse matrix multiplications, reduction etc) and some very simple ones and it seems that all (significant ones) hit ~79GB/s bandwidth wall when I calculate the total data read/written for each one of them, regardless the complexity of them, while the theoretical GPU bandwidth is 112GB/s (nVidia GTX 960) The data set is very

Why is preferred work group size multiple part of Kernel properties?

阅读更多关于 Why is preferred work group size multiple part of Kernel properties?

问题 From what I understand, the preferred work group size is roughly dependent on the SIMD width of a compute device (for NVidia, this is the Warp size, on AMD the term is Wavefront). Logically that would lead one to assume that the preferred work group size is device dependent, not kernel dependent. However, to query this property must be done relative to a particular kernel using CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE. Choosing a value which isn't a multiple of the underlying hardware

OpenCL kernel not vectorized

阅读更多关于 OpenCL kernel not vectorized

问题 I am trying to build a kernel to do parallel string search. To this end I tend to use a finite state machine. The transition table of the fsm is in the kernel argument states. The code: __kernel void Find ( __constant char *text, const int offset, const int tlenght, __constant char *characters, const int clength, const int maxlength, __constant int *states, const int statesdim){ private char c; private int state; private const int id = get_global_id(0); if (id<(tlenght-maxlength)) { private

Using __constant qualifer in OpenCL kernels

阅读更多关于 Using __constant qualifer in OpenCL kernels

问题 I am having trouble using the __constant qualifier in my OpenCL kernels. My platform is Snow Leopard. I have tried initializing a CL read-only memory object on the GPU, copying my constant array from host into it. Then I set the kernel argument just as with __global memory arguments, but this does not work as it should but I see no error or warnings. I have also tried using the data directly in the clSetKernelArg function as with float and int types, it works neither. Do I make any mistakes

OpenCL: basic questions about SIMT execution model

阅读更多关于 OpenCL: basic questions about SIMT execution model

问题 Some of the concepts and designs of the "SIMT" architecture are still unclear to me. From what I've seen and read, diverging code paths and if() altogether are a rather bad idea, because many threads might execute in lockstep. Now what does that exactly mean? What about something like: kernel void foo(..., int flag) { if (flag) DO_STUFF else DO_SOMETHING_ELSE } The parameter "flag" is the same for all work units and the same branch is taken for all work units. Now, is a GPU going to execute

Timing execution of OpenCL kernels

阅读更多关于 Timing execution of OpenCL kernels

问题 Is this a correct way of timing kernel execution time for OpenCL? I am quite keen on using the c++ wrapper (which unfortunately does not have many examples of timings). cl::CommandQueue queue(context, device, CL_QUEUE_PROFILING_ENABLE, &err); checkErr(err, "Cannot create the command queue"); /* Warm-up */ for (unsigned i = 0; i < NUMBER_OF_ITERATIONS; ++i) { err = queue.enqueueNDRangeKernel(kernel, cl::NullRange, cl::NDRange(512), cl::NullRange, NULL, NULL); checkErr(err, "Cannot enqueue the