opencl

OpenCL - Pass image2d_t twice to get both read and write from kernel?

有些话、适合烂在心里 提交于 2019-12-07 15:26:31
问题 In my OpenCL kernel I would like to both read and write to an image2d_t object. According to OpenCL standard I can only specify either __read_only or __write_only. However, I figured if I send the same cl_mem as two separate kernel arguments (one with __read_only and one with __write_only) I can do both. Probably when I do a write followed by a read, I might get the old value(?) but in my case I would like the old value first, update it and write it back to the image. A simple example would

OpenCL creating wrong colours

不想你离开。 提交于 2019-12-07 15:00:26
问题 I'm having an issue with an OpenCL image filter I've been trying to get working. I've written many of these before (Sobel Edge Detection, Auto Segmentation, and such), so I thought I knew how to do this, but the following code is giving me some really weird output: //NoRedPixels.cl __kernel void NoRedPixels( __read_only image2d_t srcImg, __write_only image2d_t dstImg, sampler_t sampler, int width, int height, int threshold, int colour, int fill) { int2 imageCoordinate = (int2)(get_global_id(0

OpenCL C/C++ dynamic binding library (win32 and more)

久未见 提交于 2019-12-07 13:26:30
问题 I'm giving a try at OpenCL, and in order to put this in production I'd like to be able to bind dynamically to OpenCL.DLL (when under Windows), in order to handle 'gracefully' the case where no OpenCL is installed on the host computer. Is there any available library (or code snippet) that takes care of this dynamic binding in C or C++, much like GLEW does for OpenGL ? I'd like to avoid the hassle to do it myself. Thanks, 回答1: Here you go: http://clcc.sourceforge.net/clew_8h.html 回答2: Since you

Sum Vector Components in OpenCL (SSE-like)

可紊 提交于 2019-12-07 12:54:53
问题 Is there a single instruction to calculate the sum of all components of a float4 , e.g., in OpenCL? float4 v; float desiredResult = v.x + v.y + v.z + v.w; 回答1: float4 v; float desiredResult = dot(v, (float4)(1.0f, 1.0f, 1.0f, 1.0f)); It's a little more work, because you're multiplying each component by one before adding them, but some GPUs have a dot product instruction built in. So might be faster; might be slower. It depends on your hardware. 来源: https://stackoverflow.com/questions/10811413

OpenCL matrix multiplication should be faster?

徘徊边缘 提交于 2019-12-07 12:03:31
问题 I'm trying to learn how to make GPU optimalized OpenCL kernells, I took example of matrix multiplication using square tiles in local memory. However I got at best case just ~10-times speedup ( ~50 Gflops ) in comparison to numpy.dot() ( 5 Gflops , it is using BLAS). I found studies where they got speedup >200x ( >1000 Gflops ) . ftp://ftp.u-aizu.ac.jp/u-aizu/doc/Tech-Report/2012/2012-002.pdf I don't know what I'm doing wrong, or if it is just because of my GPU ( nvidia GTX 275 ). Or if it is

DirectCompute versus OpenCL for GPU programming?

为君一笑 提交于 2019-12-07 11:43:06
问题 I have some (financial) tasks which should map well to GPU computing, but I'm not really sure if I should go with OpenCL or DirectCompute. I did some GPU computing, but it was a long time ago (3 years). I did it through OpenGL since there was not really any alternative back then. I've seen some OpenCL presentations and it looks really nice. I haven't seen anything about DirectCompute yet, but I expect it to also be good. I'm not interested at the moment in cross-platform compatibility, and

OpenCL not finding platforms?

孤人 提交于 2019-12-07 11:17:25
问题 I am trying to utilize the C++ API for OpenCL. I have installed my NVIDIA drivers and I have tested that I can run the simple vector addition program provided here. I can compile this program with following gcc call and the program runs without problem. gcc main.c -o vectorAddition -l OpenCL -I/usr/local/cuda-6.5/include However, I would very much prefer to use the C++ API as opposed the very verbose host files needed for C. I downloaded the C++ bindings from Khronos from here and placed the

clock() in opencl

爱⌒轻易说出口 提交于 2019-12-07 08:49:07
问题 I know that there is function clock() in CUDA where you can put in kernel code and query the GPU time. But I wonder if such a thing exists in OpenCL? Is there any way to query the GPU time in OpenCL? (I'm using NVIDIA's tool kit). 回答1: The NVIDIA OpenCL SDK has an example Using Inline PTX with OpenCL. The clock register is accessible through inline PTX as the special register %clock. %clock is described in PTX: Parallel Thread Execution ISA manual. You should be able to replace the %%laneid

Conversion of YUV data into Image format Opencl

人走茶凉 提交于 2019-12-07 08:21:28
I have been working on a project where I use YUV as an input and have to pass this information to the Kernel in order to process the function. I had looked into similar questions but never found an accurate answer to my concern. I have tried a simple method to convert the YUV into an Image format for Opencl Processing. However, when I try to print the data which has been converted into the image I get first value correct then another three as zeroes and then I get the 5th pixel value correct. I dont understand whether writing is the problem or the reading part. I am confused as to how to

Rationalizing what is going on in my simple OpenCL kernel in regards to global memory

血红的双手。 提交于 2019-12-07 08:05:44
问题 const char programSource[] = "__kernel void vecAdd(__global int *a, __global int *b, __global int *c)" "{" " int gid = get_global_id(0);" "for(int i=0; i<10; i++){" " a[gid] = b[gid] + c[gid];}" "}"; The kernel above is a vector addition done ten times per loop. I have used the programming guide and stack overflow to figure out how global memory works, but I still can't figure out by looking at my code if I am accessing global memory in a good way. I am accessing it in a contiguous fashion