How to use 2 OpenCL runtimes

巧了我就是萌 提交于 2019-12-03 15:20:51

The Smith and Thomas answers are correct; this is just expanding on that information: When you enumerate the OpenCL platforms, you'll get one for each installed driver. Within each platform you enumerate the devices. The AMD and Intel drivers also expose CPU devices. So on a fully populated machines, you might see an AMD platform (with CPU and GPU devices), an NVIDIA platform (with GPU device), and an Intel platform (with CPU and GPU devices). Your code creates a context on whichever devices you want to use, and one or more command queues to feed them work. You can keep them all busy working on things, but you can only share data buffers between devices from the same platform. To share data across platforms, it must hit CPU memory in between.

You're not thinking of it right. SDK's are not provided by the application, and are not needed for running a compiled program. OpenCL runtimes are provided by the client system, and that's what's giving your program platforms and devices to use in clGetPlatformIDs and clGetDeviceIDs.

If the user does not have an Nvidia graphics card, you are simply not going to be able to use an Nvidia platform and device on his system, because he doesn't have the Nvidia OpenCL runtime or hardware.

All different OpenCL SDK's provide you are vendor-specific extensions, which are then understood by the vendor runtime.

The Khronos OpenCL working group defined a ICD layer (installable client driver) that allows multiple vendor drivers to be installed on the system. The application accesses the vendor drivers through the ICD layer. For more details see cl_khr_icd.txt.

In regards to running on multiple OpenCL devices at the same time. If you want to run on multiple devices create a separate context for each device/vendor and run each one in a separate thread. For example I have a GTX 590. This shows up as two GTX 590 devices. I also have the Intel i7 processor. I create three contexts: two for the 590 devices and one for the CPU and run each context/device in three threads using SDL_CreateThread (pthreads works well as well). You have to weight the number of jobs for each device proportional to their "speed" if you want to get good results. For example 45% for each GTX 590 and 10% for the CPU. The best weights to use depend on the application.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!