opencl | 易学教程

Using multiple GPUs OpenCL

阅读更多关于 Using multiple GPUs OpenCL

问题 I have a loop within which I am launching multiple kernels onto a GPU. Below is the snippet: for (int idx = start; idx <= end ;idx ++) { ret = clEnqueueNDRangeKernel(command_queue, memset_kernel, 1, NULL, &global_item_size_memset, &local_item_size, 0, NULL, NULL); ASSERT_CL(ret, "Error after launching 1st memset_kernel !"); ret = clEnqueueNDRangeKernel(command_queue, cholesky_kernel, 1, NULL, &global_item_size_cholesky, &local_item_size, 0, NULL, NULL); ASSERT_CL(ret, "Error after launching

Using multiple GPUs OpenCL

阅读更多关于 Using multiple GPUs OpenCL

How to match OpenCL devices with a specific GPU given PCI vendor, device and bus IDs in a multi-GPU system?

阅读更多关于 How to match OpenCL devices with a specific GPU given PCI vendor, device and bus IDs in a multi-GPU system?

问题 I would like to be able to match OpenCL devices with GPUs in the system on multi-GPU systems identified by PCI IDs. For example, if I have a system with multiple GPUs, possibly from different vendors, I can list the devices by enumerating the PCI bus. This gives me PCI vendor, device and bus IDs. If I choose one of these (GPU) PCI devices to use for OpenCL computation based on some selection criteria, how do I match it to the OpenCL device? I can enumerate GPU devices in OpenCL using

OpenGL vs. OpenCL, which to choose and why?

阅读更多关于 OpenGL vs. OpenCL, which to choose and why?

问题 What features make OpenCL unique to choose over OpenGL with GLSL for calculations? Despite the graphic related terminology and inpractical datatypes, is there any real caveat to OpenGL? For example, parallel function evaluation can be done by rendering a to a texture using other textures. Reducing operations can be done by iteratively render to smaller and smaller textures. On the other hand, random write access is not possible in any efficient manner (the only way to do is rendering

Calculate run time of kernel code in OpenCL C

阅读更多关于 Calculate run time of kernel code in OpenCL C

问题 I want to measure the performance (read runtime) of my kernel code on various devices viz CPU and GPUs. The kernel code that I wrote is: __kernel void dataParallel(__global int* A) { sleep(10); A[0]=2; A[1]=3; A[2]=5; int pnp;//pnp=probable next prime int pprime;//previous prime int i,j; for(i=3;i<500;i++) { j=0; pprime=A[i-1]; pnp=pprime+2; while((j<i) && A[j]<=sqrt((float)pnp)) { if(pnp%A[j]==0) { pnp+=2; j=0; } j++; } A[i]=pnp; } } However I have been told that it is not possible to use

Enable/disable Optimus/Enduro in cross platform manner

阅读更多关于 Enable/disable Optimus/Enduro in cross platform manner

问题 In order to save power it is common in recent graphics architectures to dynamically switch between a discrete high-performance and an integrated lower-performance GPU, where the high-performance GPU is only enabled when the need for extra performance is present. This technology is branded as nvidia Optimus and AMD Enduro for the two main GPU vendors. However due to the non-standardized way in which these technologies work, managing them from a developer's perspective can be a nightmare. For

Can I allocate device memory using OpenCL and use pointers to the memory in CUDA?

阅读更多关于 Can I allocate device memory using OpenCL and use pointers to the memory in CUDA?

问题 Say I use OpenCL to manage memory (so that memory management between GPU/CPU uses the same code), but my calculation uses optimized CUDA and CPU code (not OpenCL). Can I still use the OpenCL device memory pointers and pass them to CUDA functions/kernels? 回答1: AFAIK this is not possible, but there is no technical reason why you shouldn't be able to. NVIDIA could build an extension to the OpenCL API to interoperate with CUDA, much like the interoperability provisions for Direct3D and OpenGL.

OpenCL kernel error on Mac OSx

阅读更多关于 OpenCL kernel error on Mac OSx

问题 I wrote some OpenCL code which works fine on LINUX, but it is failing with errors on Mac OSX. Can someone please help me to identify why these should occur. The kernel code is shown after the error. My kernel uses double, so I have the corresponding pragma at the top. But I don't know why the error shows float data type: inline float8 __OVERLOAD__ _name(float8 x) { return _default_name(x); } \ ^ /System/Library/Frameworks/OpenCL.framework/Versions/A/lib/clang/3.2/include/cl_kernel.h:4606:30:

OpenCL select/delete points from large array

阅读更多关于 OpenCL select/delete points from large array

问题 I have an array of 2M+ points (planned to be increased to 20M in due course) that I am running calculations on via OpenCL. I'd like to delete any points that fall within a random triangle geometry. How can I do this within an OpenCL kernel process? I can already: identify those points that fall outside the triangle (simple point in poly algorithm in the kernel) pass their coordinates to a global output array. But : an openCL global output array cannot be variable and so I initialise it to

OpenCL - Are work-group axes exchangeable?

阅读更多关于 OpenCL - Are work-group axes exchangeable?

问题 I was trying to find the best work-group size for a problem and I figured out something that I couldn't justify for myself. These are my results : GlobalWorkSize {6400 6400 1}, WorkGroupSize {64 4 1}, Time(Milliseconds) = 44.18 GlobalWorkSize {6400 6400 1}, WorkGroupSize {4 64 1}, Time(Milliseconds) = 24.39 Swapping axes caused a twice faster execution. Why !? By the way, I was using an AMD GPU. Thanks :-) EDIT : This is the kernel (a Simple Matrix Transposition): __kernel void transpose(_