opencl | 易学教程

Is possible to span an OpenCL kernel to run concurrently on CPU and GPU

阅读更多关于 Is possible to span an OpenCL kernel to run concurrently on CPU and GPU

问题 Lets assume that I have a computer which has a multicore processor and a GPU. I would like to write an OpenCL program which runs on all cores of the platform. Is this possible or do I need to choose a single device on which to run the kernel? 回答1: In theory yes, you can, the CL API allows it. But the platform/implementation must support it, and i don't think most CL implementatations do. To do it, get the cl_device_id of the CPU device and the GPU device, and create a context with those two

OpenCL compile on linux

阅读更多关于 OpenCL compile on linux

问题 I'm a newbie in OpenCL. From yesterday, I'm trying to use OpenCL for parallel programming instead of CUDA which is more familiar for me and experienced before. Now I have NVIDIA GTX 580 GPU, Ubuntu Linux 12.04 OS and CUDA SDK 4.1 (already installed before because of CUDA programming). In CUDA SDK folder, Some OpenCL header file and library are already included. So I just downloaded OpenCL examples from NVIDIA's Developer zone. (Here is the link! https://developer.nvidia.com/opencl) And I'm

cl.h not found - how to link in makefile

阅读更多关于 cl.h not found - how to link in makefile

问题 I have a project which requires opencl. I have installed CUDA and openCL on my machine but when I 'make' my project the following error occurs: CL/cl.h: No such file or directory I know that the i can create a hard link (in my unix (ubuntu) system) to fix the problem: ln -s /usr/include/nvidia-current/CL But i consider this a quick fix and not the correct solution. I would like to handle this in my makefile (i guess) so that a simple "make" command would compile. How could I do this? 回答1: You

Which memory access pattern is more efficient for a cached GPU?

阅读更多关于 Which memory access pattern is more efficient for a cached GPU?

问题 So lets say I have a global array of memory: |a|b|c| |e|f|g| |i|j|k| | There are four 'threads' (local work items in OpenCL) accessing this memory, and two possible patterns for this access (columns are time slices, rows are threads): 0 -> 1 -> 2 -> 3 t1 a -> b -> c -> . t2 e -> f -> g -> . t3 i -> j -> k -> . t4 . . . `> . The above pattern splits the array in to blocks with each thread iterating to and accessing the next element in a block per time slice. I believe this sort of access would

Programming Intel IGP (e.g. Iris Pro 5200) hardware without OpenCL

阅读更多关于 Programming Intel IGP (e.g. Iris Pro 5200) hardware without OpenCL

问题 The Peak GFLOPS of the the cores for the Desktop i7-4770k @ 4GHz is 4GHz * 8 (AVX) * (4 FMA) * 4 cores = 512 GFLOPS. But the latest Intel IGP (Iris Pro 5100/5200) has a peak of over 800 GFLOPS. Some algorithms will therefore run even faster on the IGP. Combining the cores with the IGP together would even be better. Additionally, the IGP keeps eating up more silicon. The Iris Pro 5100 takes up over 30% of the silicon now. It seems clear which direction Intel desktop processors are headed. As

OpenGL-OpenCL interop transfer times + texturing from bitmap

阅读更多关于 OpenGL-OpenCL interop transfer times + texturing from bitmap

问题 Two part question: I'm working on a school project using the game of life as a vehicle to experiment with gpgpu. I'm using OpenCL and OpenGL for realtime visualizations and the goal is to get this thing as big and fast as possible. Upon profiling I find that the frame time is dominated by CL Acquiring and Releasing the GL buffers, and that the time cost is directly proportional to the actual size of the buffer. 1) Is this normal? Why should this be? To the best of my understanding, the buffer

OpenCL: Running CPU/GPU multiple devices

阅读更多关于 OpenCL: Running CPU/GPU multiple devices

问题 I want to run parallel tasks on GPU and CPU with OpenCL multiple devices. The standard examples from AMD SDK are not very clear on this subject. Can you advise any additional tutorials or examples on this subject? Any advice will do. Thank you. 回答1: For tutorial and details on using multiple devices, you may want to refer section 4.12 of the AMD-APP-SDK Programming guide 回答2: Running parallel tasks on multiple devices requires dynamic scheduling for good effeciency because you never know

What kind of work benifits from OpenCL

阅读更多关于 What kind of work benifits from OpenCL

问题 First of all: I am well aware that OpenCL does not magically make everything faster I am well aware that OpenCL has limitations So now to my question, i am used to do different scientific calculations using programming. Some of the things i work with is pretty intense in regards to the complexity and number of calculations. SO i was wondering, maybe i could speed things up bu using OpenCL. So, what i would love to hear from you all is answers to some of the following [bonus for links]: *What

OpenCL image histogram

阅读更多关于 OpenCL image histogram

问题 I'm trying to write a histogram kernel in OpenCL to compute 256 bin R, G, and B histograms of an RGBA32F input image. My kernel looks like this: const sampler_t mSampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP| CLK_FILTER_NEAREST; __kernel void computeHistogram(read_only image2d_t input, __global int* rOutput, __global int* gOutput, __global int* bOutput) { int2 coords = {get_global_id(0), get_global_id(1)}; float4 sample = read_imagef(input, mSampler, coords); uchar rbin = floor

How to Step-by-Step Debug OpenCL GPU Applications under Windows with a NVidia GPU

阅读更多关于 How to Step-by-Step Debug OpenCL GPU Applications under Windows with a NVidia GPU

问题 I would like to know wether you know of any way to step-by-step debug OpenCL Kernel using Windows (my IDE is Visual Studio) and running OpenCL Kernels on a NVidia GPU. What i found so far is: with NVidias NSight you can only profile OpenCL Applications, but not debug them the current version of the gDEBugger from AMD only supports ATI/AMD GPUs the old version of gDEBugger supports NVidia GPUs but work is discontinued in Dec '10 the GDB debugger seems to support it, but is only available under