问题
I found already this OpenCL: Running CPU/GPU multiple devices.
But i've stil questions (3) how to run a programm on multiple devices. Is the recipe as follows?(Q1)
create the devices you want to use.
For every device create a context.
for every context call clBuilProgram to build a program
for every program call clCreateCommandQueue to build one command queue per context
for every context and for every function parameter call clCreateBuffer.
or must i concatenate the CommandQueues.(Q2)
Has someone some example code or a link to a tutorial? (Q3)
回答1:
You create a single context containing all the devices. Context construction takes a list of devices. You compile the program once for the context. You call clBuildProgram, or clCompileProgram and clLinkProgram once for the program, listing all the devices or not listing any devices and letting it build for all in the context. Create a command queue for each device in the context. Create a buffer for each array you want access to. If you want to process different parts of the array on different devices you can either create two buffers, or use sub-buffers to divide it into sections.
If you are not happy with the same program targeting all devices and want to optimise further you can create a separate program for each device, or create the program once and call clCompileProgram separately for each device passing in macros.
回答2:
If all of the devices that you are targetting come from the same platform then @Lee's response is fine (e.g. AMD GPUs + CPU, or Intel GPUs + CPU). If you expect to have to target a mix of platforms (e.g. combining Nvidia GPUs with AMD GPUs and a CPU) then your contexts cannot cross from one platform to another - at the very least, you will need one context per platform.
The options as I see it are:
- One device per context. Synchronization between devices requires copying to host memory.
- Multiple devices in one context, only using one platform. This can make it easier to share data between devices in the same context.
- Multiple devices from the same platform in one context, one context per platform. Allows you to concurrently utilise multiple platforms while giving you the benefits of having multiple devices in one context.
Option 3 gets a bit tricky in the work distribution because you have two levels at which work gets divided - between contexts/platforms and between devices. Option 1 is, IMHO, the easiest way to get access to every OpenCL device in a computer, irrespective of their platform. Option 2 is only really worthwhile if you are guaranteed to always be working on devices from one vendor (i.e. all devices in one platform). That assumption breaks pretty quickly if targeting GPU+CPU simultaneously.
Once you have worked through the above three options, you will need at least one command queue per device. You will need to compile your OpenCL kernels for every group of identical devices. Every generation of GPUs from every vendor is different. At the very least, you could end up with macros that have different deffinitions from one device to another. At worst, you could have different algorithms from one device to another (easier to handle if using Option 1 above).
来源:https://stackoverflow.com/questions/30218434/opencl-one-program-running-one-multiple-devices