opencl

OpenCL struct values correct on CPU but not on GPU

白昼怎懂夜的黑 提交于 2019-12-11 02:47:30
问题 I do have a struct in a file wich is included by the host code and the kernel typedef struct { float x, y, z, dir_x, dir_y, dir_z; int radius; } WorklistStruct; I'm building this struct in my c++ host code and passing it via a buffer to the OpenCL kernel. If I'm choosing an CPU device for computation I will get the following result: printf ( "item:[%f,%f,%f][%f,%f,%f]%d,%d\n", item.x, item.y, item.z, item.dir_x, item.dir_y, item.dir_z , item.radius ,sizeof(float)); Host: item:[20.169043,7

CMAke can not find opencl sdk by NVIDA

Deadly 提交于 2019-12-11 02:39:08
问题 I just installed NVIDA CUDA tool kit to use it for developing OpenCL application on windows 8.1. I came accress some problems: 1- FinedOpenCl.cmake doesn't work since opencl_dir is not set by nvida tool kit. cmake file is: FIND_PACKAGE(OpenCL REQUIRED) INCLUDE_DIRECTORIES(${OPENCL_INCLUDE_DIR}) and cmake error is: CMake Error at C:/Program Files (x86)/CMake/share/cmake-3.1/Modules/FindPackageHandleStandardArgs.cmake:138 (message): Could NOT find OpenCL (missing: OPENCL_LIBRARY OPENCL_INCLUDE

OpenCL timeout on beignet doesnt raise error?

浪子不回头ぞ 提交于 2019-12-11 02:19:47
问题 I run the following (simplified) code, which runs a simplified kernel for a few seconds, and then checks the results. The first 400,000 or so results are correct, and then the next are all zero. The kernel should put the same value (4228) into each element of the output array of 4.5 million elements. It looks like somehow, somewhere, something is timing out, or not being synchronized, but I'm a bit puzzled, since I: even called clFinish, just to make sure am checking all errors, and no errors

VexCL vector of structs?

可紊 提交于 2019-12-11 01:53:36
问题 So I know that it is possible to use custom types with OpenCL. But I haven't been able to use them with VexCL. Creating a device vector of structs works fine, but I can't perform any operations. As I haven't found any examples using custom types with VexCL my question is is that even possible? Thanks in advance. 回答1: VexCL does not support operations with vectors of structs out of the box. You will need to help it a bit. First, you need to tell VexCL how to spell the type name of the struct.

GPU Memory bandwidth theoretical vs practical

£可爱£侵袭症+ 提交于 2019-12-11 00:58:35
问题 As part of an algorithm profiling running on GPU I feel that I'm hitting the memory bandwidth. I have several complex kernels performing some complicated operations (sparse matrix multiplications, reduction etc) and some very simple ones and it seems that all (significant ones) hit ~79GB/s bandwidth wall when I calculate the total data read/written for each one of them, regardless the complexity of them, while the theoretical GPU bandwidth is 112GB/s (nVidia GTX 960) The data set is very

Why is preferred work group size multiple part of Kernel properties?

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-11 00:17:46
问题 From what I understand, the preferred work group size is roughly dependent on the SIMD width of a compute device (for NVidia, this is the Warp size, on AMD the term is Wavefront). Logically that would lead one to assume that the preferred work group size is device dependent, not kernel dependent. However, to query this property must be done relative to a particular kernel using CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE. Choosing a value which isn't a multiple of the underlying hardware

OpenCL kernel not vectorized

a 夏天 提交于 2019-12-10 23:35:29
问题 I am trying to build a kernel to do parallel string search. To this end I tend to use a finite state machine. The transition table of the fsm is in the kernel argument states. The code: __kernel void Find ( __constant char *text, const int offset, const int tlenght, __constant char *characters, const int clength, const int maxlength, __constant int *states, const int statesdim){ private char c; private int state; private const int id = get_global_id(0); if (id<(tlenght-maxlength)) { private

Using __constant qualifer in OpenCL kernels

此生再无相见时 提交于 2019-12-10 22:58:27
问题 I am having trouble using the __constant qualifier in my OpenCL kernels. My platform is Snow Leopard. I have tried initializing a CL read-only memory object on the GPU, copying my constant array from host into it. Then I set the kernel argument just as with __global memory arguments, but this does not work as it should but I see no error or warnings. I have also tried using the data directly in the clSetKernelArg function as with float and int types, it works neither. Do I make any mistakes

OpenCL: basic questions about SIMT execution model

微笑、不失礼 提交于 2019-12-10 20:25:17
问题 Some of the concepts and designs of the "SIMT" architecture are still unclear to me. From what I've seen and read, diverging code paths and if() altogether are a rather bad idea, because many threads might execute in lockstep. Now what does that exactly mean? What about something like: kernel void foo(..., int flag) { if (flag) DO_STUFF else DO_SOMETHING_ELSE } The parameter "flag" is the same for all work units and the same branch is taken for all work units. Now, is a GPU going to execute

Timing execution of OpenCL kernels

人盡茶涼 提交于 2019-12-10 19:50:56
问题 Is this a correct way of timing kernel execution time for OpenCL? I am quite keen on using the c++ wrapper (which unfortunately does not have many examples of timings). cl::CommandQueue queue(context, device, CL_QUEUE_PROFILING_ENABLE, &err); checkErr(err, "Cannot create the command queue"); /* Warm-up */ for (unsigned i = 0; i < NUMBER_OF_ITERATIONS; ++i) { err = queue.enqueueNDRangeKernel(kernel, cl::NullRange, cl::NDRange(512), cl::NullRange, NULL, NULL); checkErr(err, "Cannot enqueue the