opencl | 易学教程

Rotating hundreds of JPEGs in seconds rather than hours

阅读更多关于 Rotating hundreds of JPEGs in seconds rather than hours

问题 We have hundreds of images which our computer gets at a time and we need to rotate and resize them as fast as possible. Rotation is done by 90, 180 or 270 degrees. Currently we are using the command line tool GraphicsMagick to rotate the image. Rotating the images (5760*3840 ~ 22MP) takes around 4 to 7 seconds. The following python code sadly gives us equal results import cv img = cv.LoadImage("image.jpg") timg = cv.CreateImage((img.height,img.width), img.depth, img.channels) # transposed

How to draw OpenCL calculated pixels to the screen with OpenGL?

阅读更多关于 How to draw OpenCL calculated pixels to the screen with OpenGL?

I wan't to do some calculated pixelart with OpenCL and display this directly on the display without CPU roundtripping. I could use interoperability of OpenCL with OpenGL and write to the texture-banks of the GPU and display the texture with OpenGL. I was wondering what would be the best way to do this, since I do not need any 3d stuff, just 2d pixelart. The best way would be to use OpenCL/OpenGL interop, if your OpenCL implementation supports it. This allows OpenCL to access certain OpenGL objects (buffer objects and textures/renderbuffers). You won't be able to directly access the default

In OpenCL, what does mem_fence() do, as opposed to barrier()?

阅读更多关于 In OpenCL, what does mem_fence() do, as opposed to barrier()?

Unlike barrier() (which I think I understand), mem_fence() does not affect all items in the work group. The OpenCL spec says (section 6.11.10), for mem_fence() : Orders loads and stores of a work-item executing a kernel. (so it applies to a single work item). But, at the same time, in section 3.3.1, it says that: Within a work-item memory has load / store consistency. so within a work item the memory is consistent. So what kind of thing is mem_fence() useful for? It doesn't work across items, yet isn't needed within an item... Note that I haven't used atomic operations (section 9.5 etc). Is

OpenCL autocorrelation kernel

阅读更多关于 OpenCL autocorrelation kernel

I have written a simple program that does autocorrelation as follows...I've used pgi accelerator directives to move the computation to GPUs. //autocorrelation void autocorr(float *restrict A, float *restrict C, int N) { int i, j; float sum; #pragma acc region { for (i = 0; i < N; i++) { sum = 0.0; for (j = 0; j < N; j++) { if ((i+j) < N) sum += A[j] * A[i+j]; else continue; } C[i] = sum; } } } I wrote a similar program in OpenCL, but I am not getting correct results. The program is as follows...I am new to GPU programming, so apart from hints that could fix my error, any other advices are

Transfer data from Mat/oclMat to cl_mem (OpenCV + OpenCL)

阅读更多关于 Transfer data from Mat/oclMat to cl_mem (OpenCV + OpenCL)

问题 I am working on a project that needs a lot of OpenCL code. I am using OpenCV's ocl module to develop my project faster but there are some functions not implemented and I will have to write my own OpenCL code. My question is this: what is the quickest and cheapest way to transfer data from Mat and/or oclMat to a cl_mem array. Re-wording this, is there a good way to transfer or enqueue (clEnqueueWriteBuffer) data from oclMat or Mat? Currently, I am using a for-loop to read data from Mat (or

Using clCreateSubBuffer

阅读更多关于 Using clCreateSubBuffer

I am trying to use create a subBuffer to read in a chunk of the buffer created from a 1-D vector. This is the code I am using: d_treeArray = clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(cl_uint)*total,NULL,&err); cl_buffer_region region; region.origin = 0; // This works //region.origin = 4; // This doesnt work region.size = 10*sizeof(cl_uint); d_subtreeArray = clCreateSubBuffer(d_treeArray,CL_MEM_READ_WRITE,CL_BUFFER_CREATE_TYPE_REGION, &region, &err); if(err != CL_SUCCESS) { std::cout << "Cannot set buffers" << std::endl; exit(1); } Now when I give the region.origin as anything other

How to use 2 OpenCL runtimes

阅读更多关于 How to use 2 OpenCL runtimes

问题 I want to use 2 OpenCL runtimes in one system together (in my case AMD and Nvidia, but the question is pretty generic). I know that I can compile my program with any SDK. But when running the program, I need to provide libOpenCL.so. How can I provide the libs of both runtimes so that I see 3 devices (AMD CPU, AMD GPU, Nvidia GPU) in my OpenCL program? I know that it must be possible somehow, but I didn't find a description on how to do it for linux, yet. Thanks a lot, Tomas 回答1: The Smith and

Passing Class to a Kernel in Intel Opencl

阅读更多关于 Passing Class to a Kernel in Intel Opencl

I have been working on an c/c++ OpenCL solution for the past few weeks now. For my solution, I need to pass a class from my CPU(Host) to GPU(Device). When I try to pass the class as an argument it gives an error "Unknown Type-Identifier Class". My doubt whether OpenCL on Intel Platform does it allow us to pass a class to kernel or any work around is available for it. In CUDA I have seen some examples and it works perfectly fine for the platform. However, with respect to OpenCL I am not able to find any references and No examples related to this query. I would be really thankful to any help

thrust: fill isolate space

阅读更多关于 thrust: fill isolate space

I have an array like this: 0 0 0 1 0 0 0 0 5 0 0 3 0 0 0 8 0 0 I want every non-zero elements to expand themselves one element at a time until it reaches other non-zero elements, the result is like this: 1 1 1 1 1 1 5 5 5 5 3 3 3 3 8 8 8 8 Is there any way to do this using thrust? Is there any way to do this using thrust? Yes, here is one possible approach. For each position in the sequence, compute 2 distances. The first is the distance to the nearest non-zero value in the left direction, and the second is the distance to the nearest non-zero value in the right direction. If the position

GPU programming on Clojure?

阅读更多关于 GPU programming on Clojure?

问题 I'm wondering what if any GPU integration libraries exist for Clojure? I've seen examples of this that involve hand-rolling OpenCL code, but I'm specifically I'm looking for something similar to Anacoda accelerate, which translates Numpy Python expressions to CUDA code relatively seamlessly. I'm open to either OpenCL or Cuda approaches. 回答1: here is a project that recently started on github https://github.com/JulesGosnell/clumatra. Its seems more like an experiment and its quite impressive!