opencl

SIMD-8,SIMD-16 or SIMD-32 in opencl on gpgpu

笑着哭i 提交于 2019-12-06 11:53:35
I read couple of questions on SO for this topic(SIMD Mode), but still slight clarification/confirmation of how things work is required. Why use SIMD if we have GPGPU? SIMD intrinsics - are they usable on gpus? CPU SIMD vs GPU SIMD? Are following points correct,if I compile the code in SIMD-8 mode ? 1) it means 8 instructions of different work items are getting executing in parallel. 2) Does it mean All work items are executing the same instruction only? 3) if each wrok item code contains vload16 load then float16 operations and then vstore16 operations only. SIMD-8 mode will still work. I mean

PyOpenCL Matrix multiplication

∥☆過路亽.° 提交于 2019-12-06 11:37:45
问题 I have this code for matrix multiplication using pyopenCL. My problem is that the result is wrong in some matrices, and I dont understand why. After some research i think its related with global size of something like that but i dont understand how to set that values. For example: matrices using numpy dtype = float32 matrix 1: [[ 0.99114645 0.09327769 0.90075564 0.8913309 ] [ 0.59739089 0.13906649 0.94246316 0.65673178] [ 0.24535166 0.68942326 0.41361505 0.5789603 ] [ 0.31962237 0.17714553 0

How to add header file path in CMake file

限于喜欢 提交于 2019-12-06 11:28:36
I am new to OpenCL. I have written a vector addition code in OpenCL with help from Internet. I have included one header file i.e. CL/cl.h using #include . I am using NVIDIA graphic card and the OpenCL implementation is NVIDIA_GPU_Computing_SDK. My OpenCL header files are residing at this path /opt/NVIDIA_GPU_Computing_SDK/OpenCL/common/inc . I can run OpenCL programs through linux terminal by adding this path when compiling my code. But now I want to write CMake file for this code. CMake files are working fine for C programs, but not OpenCL programs because of this Path problem. In terminal, I

Is there a maximum limit to private memory in OpenCL?

烂漫一生 提交于 2019-12-06 10:48:45
Does the OpenCL specification set any maximum limit on the amount of private memory that can be used? If so, how do I get this number? I have a function which gives the correct result when run outside OpenCL, but when converted to a kernel, it spews out garbage. I checked the amount of private memory being used per work item using the CL_KERNEL_PRIVATE_MEM_SIZE flag and it is ~ 4000 bytes. I suspect that I am using too much private memory and this is somehow leading to junk computation. Its different for different archs. For example, a hd7870's private memory per compute-unit is 256kB and if

OpenCL to OpenGL texture problems

匆匆过客 提交于 2019-12-06 09:57:03
I'm trying to use OpenCL to draw to a cl_image that I got from a OpenGL texture and then render that texture. The problem is when I run my code on CL_DEVICE_TYPE_CPU it works fine however when I run on CL_DEVICE_TYPE_GPU the texture appears to be some random pixels. I'm new to OpenCL and not sure what's goin on so I'll post code below, also using OpenCL on OSX. Host Code: #import "GLView.h" #import <GLKit/GLKit.h> #import <OpenCL/OpenCL.h> #import "kernel.cl.h" #define WIDTH 500 #define HEIGHT 500 static GLfloat squareVertexData[] = { -0.5f, -0.5f, 0.0f, 0.0f, 0.0f, 0.5f, -0.5f, 0.0f, 1.0f, 0

Proper way of compiling OpenCL applications and using available compiler options

心已入冬 提交于 2019-12-06 09:35:06
I am a newbie in OpenCL stuffs. Whats is the best way to compiler an OpenCL project ? Using a supported compiler ( GCC or Clang ): When we use a compiler like gcc or clang , how do we control these options? Are they have to be set inside the source code, or, likewise the normal compilation flow we can pass them on the command line. Looking at the Khornos-Manual-1.2 , there are a few options provided for cl_int clBuildProgram for optimizations. : gcc|clang -O3 -I<INCLUDES> OpenCL_app.c -framework OpenCL OPTION -lm Actually, I Tried this and received an error : gcc: error: unrecognized command

passing array of structs to the kernel in open CL

跟風遠走 提交于 2019-12-06 07:46:53
问题 hi i'm trying to implement the distance vector program in open CL .. basically i'm having problems with passing an array of structures into the kernel as an argument .. my structure definition is this typedef struct { int a[nodes][4]; }node; node * srcA; after allocating the memory for this .. i have bundled it into a buffer object using this code // allocate the buffer memory objects memobjs1 = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, sizeof(node) * n, srcA, NULL); if

Is global synchronization in OpenCL possible?

喜你入骨 提交于 2019-12-06 07:17:59
问题 As well known OpenCL barrier() function works only for single workgroup, and there is no direct possibility to synchronize workgroups. If it possible what's best approach for global synchronization today? Using atomics, OpenCL 2.0 features, etc.? Github links, examples are welcome! Thankx! 回答1: Global syncronization within a kernel is not possible. This is because work groups are not gauranteed to be run at the same time. You can achieve a sort of global sync in the host application if you

I need help understanding data alignment in OpenCL's buffers

风流意气都作罢 提交于 2019-12-06 07:10:49
问题 Given the following structure typedef struct { float3 position; float8 position1; } MyStruct; I'm creating a buffer to pass it as a pointer to the kernel the buffer will have the previous buffer format. I understand that I've to add 4 bytes in the buffer after writing the three floats to get the next power of two (16 bytes) but I don't understand why I've to add another 16 bytes extra before writing the bytes of position1. Otherwise I get wrong values in position1. Can someone explain me why?

What is the minimal nesesary file subset, required to AMD OpenCL work on Linux?

我的梦境 提交于 2019-12-06 06:17:21
I've built Linux Kernel, with means of the buildroot. I've incorporated opensource amdgpu driver and required firmwares into it. Driver is fine, detecting GPUs, mode setting acts good, adjusting resolution for "small text", and command line shows up after boot. Now I require to run OpenCL program. I manually unpacked files from amdgpu-pro driver(rhel7 variant) and assembled skeleton fs then copied what I thought was required. OpenCL does not recognise any devices and clinfo utility throws cl::error from cl::getPlatformIDs() call. What are exactly files required for OpenCL to fully work on