opencl

OpenCL float sum reduction

匿名 (未验证) 提交于 2019-12-03 08:48:34
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 由 翻译 强力驱动 问题: I would like to apply a reduce on this piece of my kernel code (1 dimensional data): __local float sum = 0 ; int i ; for ( i = 0 ; i < length ; i ++) sum += //some operation depending on i here; Instead of having just 1 thread that performs this operation, I would like to have n threads (with n = length) and at the end having 1 thread to make the total sum. In pseudo code, I would like to able to write something like this: int i = get_global_id ( 0 ); __local float sum = 0 ; sum += //some operation depending on i here; barrier (

How to Step-by-Step Debug OpenCL GPU Applications under Windows with a NVidia GPU

匿名 (未验证) 提交于 2019-12-03 08:44:33
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 由 翻译 强力驱动 问题: I would like to know wether you know of any way to step-by-step debug OpenCL Kernel using Windows (my IDE is Visual Studio) and running OpenCL Kernels on a NVidia GPU. What i found so far is: with NVidias NSight you can only profile OpenCL Applications, but not debug them the current version of the gDEBugger from AMD only supports ATI/AMD GPUs the old version of gDEBugger supports NVidia GPUs but work is discontinued in Dec '10 the GDB debugger seems to support it, but is only available under Linux the Intel OpenCL SDK brings a

In OpenCL, what does mem_fence() do, as opposed to barrier()?

匿名 (未验证) 提交于 2019-12-03 08:36:05
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: Unlike barrier() (which I think I understand), mem_fence() does not affect all items in the work group. The OpenCL spec says (section 6.11.10), for mem_fence() : Orders loads and stores of a work-item executing a kernel. (so it applies to a single work item). But, at the same time, in section 3.3.1, it says that: Within a work-item memory has load / store consistency. so within a work item the memory is consistent. So what kind of thing is mem_fence() useful for? It doesn't work across items, yet isn't needed within an item... Note that I

How to draw OpenCL calculated pixels to the screen with OpenGL?

匿名 (未验证) 提交于 2019-12-03 08:35:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 由 翻译 强力驱动 问题: I wan't to do some calculated pixelart with OpenCL and display this directly on the display without CPU roundtripping. I could use interoperability of OpenCL with OpenGL and write to the texture-banks of the GPU and display the texture with OpenGL. I was wondering what would be the best way to do this, since I do not need any 3d stuff, just 2d pixelart. 回答1: The best way would be to use OpenCL/OpenGL interop, if your OpenCL implementation supports it. This allows OpenCL to access certain OpenGL objects (buffer objects and textures

OpenCL LLVM IR generation from Clang

匿名 (未验证) 提交于 2019-12-03 08:28:06
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I am using the following command line for clang: clang -Dcl_clang_storage_class_specifiers -isystem $LIBCLC/generic/include -include clc/clc.h -target nvptx--nvidiacl -x cl some_kernel.cl -emit-llvm -S -o some_kernel.ll the result is: ; ModuleID = 'kernel.cl' target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64" target triple = "nvptx--nvidiacl" ; Function Attrs: noinline nounwind define void @vector_add(float addrspace(1)* nocapture %vec1,

printf function doesn't work in OpenCL kernel

百般思念 提交于 2019-12-03 08:23:53
Hi I trying to debug OpenCL kernel code on PS3. Here is the code: #pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable int offset() { return 'A' - 'a'; } __kernel void tKernel(__global unsigned char *in, __global unsigned char *out) { size_t i; printf(“var”); for (i = 0; i < 10; i++) out[i] = in[i] + offset(); } In IBM OpenCL_guide.pdf in section 4.3.3 on page 18, there are describe debugging kernel with printf method. So I add the printf function to my kernel and trying test it. But OpenCL compile gave me this error: "IBM_OpenCL_kernel.cl", line 9.15: 1506-766 (S) The universal

OpenCL/OpenGL Interop with Multiple GPUs

白昼怎懂夜的黑 提交于 2019-12-03 07:33:58
问题 I'm having trouble using multiple GPUs with OpenCL/OpenGL interop. I'm trying to write an application which renders the result of an intensive computation. In the end it will run an optimization problem, and then, based on the result, render something to the screen. As a test case, I'm starting with the particle simulation example code from this course: http://web.engr.oregonstate.edu/~mjb/sig13/ The example code creates and OpenGL context, then creates a OpenCL context that shares the state,

Using Python+Theano with OpenCL in an AMD GPU

限于喜欢 提交于 2019-12-03 07:14:32
问题 I'm trying to use Python with Theano to accelerate some code with OpenCL. I installed libgpuarray and pygpu as instructed (I think), and got no errors. The installation detected the OpenCL runtime installed. I just cannot run the Theano example for OpenCL, mainly because I don't know how to specify my GPU. My GPU is a Radeon HD 5340/5450/5470 , according to inxi . All code in the Theano documentation uses device=cuda0 and the only place where OpenCL is mentioned, it says device=openclN where

Matrix inversion in OpenCL

邮差的信 提交于 2019-12-03 06:49:26
I am trying to accelerate some computations using OpenCL and part of the algorithm consists of inverting a matrix. Is there any open-source library or freely available code to compute lu factorization (lapack dgetrf and dgetri) of matrix or general inversion written in OpenCL or CUDA? The matrix is real and square but doesn't have any other special properties besides that. So far, I've managed to find only basic blas matrix-vector operations implementations on gpu. The matrix is rather small, only about 60-100 rows and cols, so it could be computed faster on cpu, but it's used kinda in the

how to compile opencl project with kernels

最后都变了- 提交于 2019-12-03 06:41:03
问题 I am totally a beginner on opencl, I searched around the internet and found some "helloworld" demos for opencl project. Usually in such sort of minimal project, there is a *.cl file contains some sort of opencl kernels and a *.c file contains the main function. Then the question is how do I compile this kind of project use a command line. I know I should use some sort of -lOpenCL flag on linux and -framework OpenCL on mac. But I have no idea to link the *.cl kernel to my main source file.