Using atomic arithmetic operations in CUDA Unified Memory multi-GPU or multi-processor

问题

I am trying to implement a CUDA program that uses Unified Memory. I have two unified arrays and sometimes they need to be updated atomically.

The question below has an answer for a single GPU environment but I am not sure how to extend the answer given in the question to adapt in multi-GPU platforms.

Question: cuda atomicAdd example fails to yield correct output

I have 4 Tesla K20 if you need this information and all of them updates a part of those arrays that must be done atomically.

I would appreciate any help/recommendations.

回答1:

To summarize comments into an answer:

You can perform this sort of address space wide atomic operation using atomicAdd_system
However, you can only do this on compute capability 6.x or newer devices (7.2 or newer if using Tegra)
specifically this means you have to compile for the correct compute capability such as -arch=sm_60 or similar
You state in the question you are using Telsa K20 cards -- these are compute capability 3.5 and do not support any of the system wide atomic functions.

As always, this information is neatly summarized in the relevant section of the Programming Guide.

来源：https://stackoverflow.com/questions/62267353/using-atomic-arithmetic-operations-in-cuda-unified-memory-multi-gpu-or-multi-pro

标签

cuda

atomic

unified-memory

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!