opencl

Using multiple GPUs OpenCL

自古美人都是妖i 提交于 2019-12-21 03:03:48
问题 I have a loop within which I am launching multiple kernels onto a GPU. Below is the snippet: for (int idx = start; idx <= end ;idx ++) { ret = clEnqueueNDRangeKernel(command_queue, memset_kernel, 1, NULL, &global_item_size_memset, &local_item_size, 0, NULL, NULL); ASSERT_CL(ret, "Error after launching 1st memset_kernel !"); ret = clEnqueueNDRangeKernel(command_queue, cholesky_kernel, 1, NULL, &global_item_size_cholesky, &local_item_size, 0, NULL, NULL); ASSERT_CL(ret, "Error after launching

Using multiple GPUs OpenCL

限于喜欢 提交于 2019-12-21 03:03:46
问题 I have a loop within which I am launching multiple kernels onto a GPU. Below is the snippet: for (int idx = start; idx <= end ;idx ++) { ret = clEnqueueNDRangeKernel(command_queue, memset_kernel, 1, NULL, &global_item_size_memset, &local_item_size, 0, NULL, NULL); ASSERT_CL(ret, "Error after launching 1st memset_kernel !"); ret = clEnqueueNDRangeKernel(command_queue, cholesky_kernel, 1, NULL, &global_item_size_cholesky, &local_item_size, 0, NULL, NULL); ASSERT_CL(ret, "Error after launching

How to match OpenCL devices with a specific GPU given PCI vendor, device and bus IDs in a multi-GPU system?

一笑奈何 提交于 2019-12-20 10:43:21
问题 I would like to be able to match OpenCL devices with GPUs in the system on multi-GPU systems identified by PCI IDs. For example, if I have a system with multiple GPUs, possibly from different vendors, I can list the devices by enumerating the PCI bus. This gives me PCI vendor, device and bus IDs. If I choose one of these (GPU) PCI devices to use for OpenCL computation based on some selection criteria, how do I match it to the OpenCL device? I can enumerate GPU devices in OpenCL using

OpenGL vs. OpenCL, which to choose and why?

六月ゝ 毕业季﹏ 提交于 2019-12-20 07:58:49
问题 What features make OpenCL unique to choose over OpenGL with GLSL for calculations? Despite the graphic related terminology and inpractical datatypes, is there any real caveat to OpenGL? For example, parallel function evaluation can be done by rendering a to a texture using other textures. Reducing operations can be done by iteratively render to smaller and smaller textures. On the other hand, random write access is not possible in any efficient manner (the only way to do is rendering

Calculate run time of kernel code in OpenCL C

梦想与她 提交于 2019-12-20 06:17:15
问题 I want to measure the performance (read runtime) of my kernel code on various devices viz CPU and GPUs. The kernel code that I wrote is: __kernel void dataParallel(__global int* A) { sleep(10); A[0]=2; A[1]=3; A[2]=5; int pnp;//pnp=probable next prime int pprime;//previous prime int i,j; for(i=3;i<500;i++) { j=0; pprime=A[i-1]; pnp=pprime+2; while((j<i) && A[j]<=sqrt((float)pnp)) { if(pnp%A[j]==0) { pnp+=2; j=0; } j++; } A[i]=pnp; } } However I have been told that it is not possible to use

Enable/disable Optimus/Enduro in cross platform manner

北城余情 提交于 2019-12-20 05:15:45
问题 In order to save power it is common in recent graphics architectures to dynamically switch between a discrete high-performance and an integrated lower-performance GPU, where the high-performance GPU is only enabled when the need for extra performance is present. This technology is branded as nvidia Optimus and AMD Enduro for the two main GPU vendors. However due to the non-standardized way in which these technologies work, managing them from a developer's perspective can be a nightmare. For

Can I allocate device memory using OpenCL and use pointers to the memory in CUDA?

雨燕双飞 提交于 2019-12-20 04:19:27
问题 Say I use OpenCL to manage memory (so that memory management between GPU/CPU uses the same code), but my calculation uses optimized CUDA and CPU code (not OpenCL). Can I still use the OpenCL device memory pointers and pass them to CUDA functions/kernels? 回答1: AFAIK this is not possible, but there is no technical reason why you shouldn't be able to. NVIDIA could build an extension to the OpenCL API to interoperate with CUDA, much like the interoperability provisions for Direct3D and OpenGL.

OpenCL kernel error on Mac OSx

穿精又带淫゛_ 提交于 2019-12-20 03:28:18
问题 I wrote some OpenCL code which works fine on LINUX, but it is failing with errors on Mac OSX. Can someone please help me to identify why these should occur. The kernel code is shown after the error. My kernel uses double, so I have the corresponding pragma at the top. But I don't know why the error shows float data type: inline float8 __OVERLOAD__ _name(float8 x) { return _default_name(x); } \ ^ /System/Library/Frameworks/OpenCL.framework/Versions/A/lib/clang/3.2/include/cl_kernel.h:4606:30:

OpenCL select/delete points from large array

淺唱寂寞╮ 提交于 2019-12-20 02:47:25
问题 I have an array of 2M+ points (planned to be increased to 20M in due course) that I am running calculations on via OpenCL. I'd like to delete any points that fall within a random triangle geometry. How can I do this within an OpenCL kernel process? I can already: identify those points that fall outside the triangle (simple point in poly algorithm in the kernel) pass their coordinates to a global output array. But : an openCL global output array cannot be variable and so I initialise it to

OpenCL - Are work-group axes exchangeable?

白昼怎懂夜的黑 提交于 2019-12-20 02:46:06
问题 I was trying to find the best work-group size for a problem and I figured out something that I couldn't justify for myself. These are my results : GlobalWorkSize {6400 6400 1}, WorkGroupSize {64 4 1}, Time(Milliseconds) = 44.18 GlobalWorkSize {6400 6400 1}, WorkGroupSize {4 64 1}, Time(Milliseconds) = 24.39 Swapping axes caused a twice faster execution. Why !? By the way, I was using an AMD GPU. Thanks :-) EDIT : This is the kernel (a Simple Matrix Transposition): __kernel void transpose(_