Error: BFS on CUDA Synchronization

只谈情不闲聊 提交于 2019-12-04 16:33:07

There are a number of pretty major flaws in that device code. Firstly, you have memory races on both Xa and Ca. Secondly, you have a conditionally executed __syncthreads() call, which is illegal and can lead to the kernel hanging if executed by a warp of threads where any branch divergence around the call can occur.

The structure of the algorithm you are using probably isn't going to be correct on CUDA, even if you were to use atomic memory access functions to eliminate the worst pf read-after-write races in the code as posted. Using atomic memory access will effectively serialise the code and cost a great deal of performance.

Breadth first search on CUDA isn't an unsolved problem. There are a number of good papers on implementations, if you care to search for them. I would recommend High Performance and Scalable GPU Graph Traversal, if you have not already seen it. The code for those authors' implementation is also available for download from here.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!