clEnqueueNDRange blocking on Nvidia hardware? (Also Multi-GPU)

好久不见. 提交于 2019-11-29 16:18:38
Chaosed0

I emailed the Nvidia guys and actually got a pretty fair response. There's a sample in the Nvidia SDK that shows, for each device you need to create seperate:

  • queues - So you can represent each device and enqueue orders to it
  • buffers - One buffer for each array you need to pass to the device, otherwise the devices will pass around a single buffer, waiting for it to become available and effectively serializing everything.
  • kernel - I think this one's optional, but it makes specifying arguments a lot easier.

Furthermore, you have to call EnqueueNDRangeKernel for each queue in separate threads. That's not in the SDK sample, but the Nvidia guy confirmed that the calls are blocking.

After doing all this, I achieved concurrency on multiple GPUs. However, there's still a bit of a problem. On to the next question...

Yes, you're right. AFAIK - the nvidia implementation has a synchronous "clEnqueueNDRange". I have noticed this when using my library (Brahma) as well. I don't know if there is a workaround or a way of preventing this, save using a different implementation (and hence device).

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!