opencl | 易学教程

Can you allocate a buffer that is larger than the device memory using opencl on a GPU

阅读更多关于 Can you allocate a buffer that is larger than the device memory using opencl on a GPU

问题 Is it possible to allocate a buffer that is larger than the device memory (assuming a GPU)? I'm pretty sure this: clCreateBuffer(context,CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,sizeof(float) * DATA_SIZE, inputdata, NULL); does not work. but shouldn't this work?: clCreateBuffer(context,CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR,sizeof(float) * DATA_SIZE, inputdata, NULL); I seem to be having trouble getting it to work with my NVIDIA QUADRO FX 3800, but I heard of others that have had success

How do I take the advantage of OpenCL in an Emgu CV project

阅读更多关于 How do I take the advantage of OpenCL in an Emgu CV project

问题 I'm newbie in using Emgu CV and started to create small sample projects, for example face detection, eye detection,..etc. It would be good if I could take the advantage of OpenCL to accelerate the process using gpu. Otherwise, it causes massive cpu utilization when I decrease the scaleFactor. How can I do that? Thanks. 回答1: As far as I know (from the official page of Emgu look at the bootom of the page) the UMat image format uses automatically the OpenCL Engine. First you have to set the

Restrict number of GPUs for AMD OpenCL

阅读更多关于 Restrict number of GPUs for AMD OpenCL

问题 Is there a solution to restrict the used number of GPUs for AMD OpenCL platforms? For NVIDIA platforms one can simply set the environment variable CUDA_VISIBLE_DEVICES to limit the set of GPUs available to OpenCL. EDIT: I know, that I can create a context with a reduced set of devices. However, I am looking for ways to control the number of devices for the OpenCL platform from "outside". 回答1: AMD have the GPU_DEVICE_ORDINAL environment variable for both Windows and Linux. This allows you to

Detect OpenCL device vendor in kernel code

阅读更多关于 Detect OpenCL device vendor in kernel code

问题 I'm writing some platform specific optimizations and while I'm aware of the fact that I could parse the vendor string in the host code and send that to the kernel using the -D option, it is perhaps more convenient to detect the vendor in the kernel directly, without host involvement (that way it is possible to optimize kernels even without access to host source code, ...). So far, I have come up with the following: #ifdef __NV_CL_C_VERSION /** * @def NVIDIA * @brief defined when compiling on

How can I create an shared context between OpenGL and OpenCL with glfw3 on OSX?

阅读更多关于 How can I create an shared context between OpenGL and OpenCL with glfw3 on OSX?

问题 I am building a particle system using OpenGL and OpenCL. I need to share VBOs between OpenGL and OpenCL and therefore create an OpenCL context with the appropriate properties. I am aware that glfw3 exposes some native API functions however I can't figure out how to access the CGL ones. https://github.com/glfw/glfw/blob/master/include/GLFW/glfw3native.h I basically need to find how to run this with glfw3: CGLContextObj kCGLContext = CGLGetCurrentContext(); CGLShareGroupObj kCGLShareGroup =

SIMD-8,SIMD-16 or SIMD-32 in opencl on gpgpu

阅读更多关于 SIMD-8,SIMD-16 or SIMD-32 in opencl on gpgpu

问题 I read couple of questions on SO for this topic(SIMD Mode), but still slight clarification/confirmation of how things work is required. Why use SIMD if we have GPGPU? SIMD intrinsics - are they usable on gpus? CPU SIMD vs GPU SIMD? Are following points correct,if I compile the code in SIMD-8 mode ? 1) it means 8 instructions of different work items are getting executing in parallel. 2) Does it mean All work items are executing the same instruction only? 3) if each wrok item code contains

Optimal Local/Global worksizes in OpenCL

阅读更多关于 Optimal Local/Global worksizes in OpenCL

问题 I am wondering how to chose optimal local and global work sizes for different devices in OpenCL? Is it any universal rule for AMD, NVIDIA, INTEL GPUs? Should I analyze physical build of the devices (number of multiprocessors, number of streaming processors in multiprocessor, etc)? Does it depends on the algorithm/implementation? Because I saw that some libraries (like ViennaCL) to assess correct values just tests many combination of local/global work sizes and chose best combination. 回答1:

Passing Class to a Kernel in Intel Opencl

阅读更多关于 Passing Class to a Kernel in Intel Opencl

问题 I have been working on an c/c++ OpenCL solution for the past few weeks now. For my solution, I need to pass a class from my CPU(Host) to GPU(Device). When I try to pass the class as an argument it gives an error "Unknown Type-Identifier Class". My doubt whether OpenCL on Intel Platform does it allow us to pass a class to kernel or any work around is available for it. In CUDA I have seen some examples and it works perfectly fine for the platform. However, with respect to OpenCL I am not able

Calling OpenCL kernel from another OpenCL kernel

阅读更多关于 Calling OpenCL kernel from another OpenCL kernel

问题 I have seen in one post here that we can call a function from an OpenCL kernel. But in my situation, I need that complex function to be parallelized (run by all available threads) as well, so do I have to make that function a kernel too and call it straight away like function from the main kernel ? or whats possible solution for this situation? Thanks in advance 回答1: You can call helper functions from your kernel and they will be parallelized in the same manner as the kernel, imagine them as

Creating a copy of the buffer pointed by host ptr on the GPU from GPU kernel in OpenCL

阅读更多关于 Creating a copy of the buffer pointed by host ptr on the GPU from GPU kernel in OpenCL

问题 I was trying to understand how exactly CL_MEM_USE_HOST_PTR and CL_MEM_COPY_HOST_PTR work. Basically when using CL_MEM_USE_HOST_PTR, say in creating a 2D image, this will copy nothing to the device, instead the GPU will refer the mapped memory(clEnqueueMapBuffer maps it) on the host, do the processing and we can write the results to some other location. On the other hand if I use the CL_MEM_COPY_HOST_PTR, it will create a copy of the data pointed to by host ptr on the device(I guess it will