问题
Like the title says, I'm working on a little personal research into parallel computer vision techniques. Using CUDA, I am trying to implement a GPGPU version of the Hough transform. The only problem that I've encountered is during the voting process. I'm calling atomicAdd() to prevent multiple, simultaneously write operations and I don't seem to be gaining too much performance efficiency. I've searched the web, but haven't found any way to noticeably enhance the performance of the voting process.
Any help you could provide regarding the voting process would be greatly appreciated.
回答1:
I'm not familiar with the Hough transform, so posting some pseudocode could help here. But if you are interested in voting, you might consider using the CUDA vote intrinsic instructions to accelerate this.
Note this requires 2.0 or later compute capability (Fermi or later).
If you are looking to count the number of threads in a block for which a specific condition is true, you can just use __syncthreads_count()
.
bool condition = ...; // compute the condition
int blockCount = __syncthreads_count(condition); // must be in non-divergent code
If you are looking to count the number of threads in a grid for which the condition is true, you can then do the atomicAdd
bool condition = ...; // compute the condition
int blockCount = __syncthreads_count(condition); // must be in non-divergent code
atomicAdd(totalCount, blockCount);
If you need to count the number of threads in a group smaller than a block for which the condition is true, you can use __ballot()
and __popc()
(population count).
// get the count of threads within each warp for which the condition is true
bool condition = ...; // compute the condition in each thread
int warpCount = __popc(__ballot()); // see the CUDA programming guide for details
Hope this helps.
回答2:
In a very short past, I did use the voting processes...
at the very end, the atomicAdd become even faster and in both scenarios
this link is very useful: warp-filtering
an this one was my solved problem Write data only from selected lanes in a Warp using Shuffle + ballot + popc
aren't u looking for a critical section?
来源:https://stackoverflow.com/questions/11101642/generalized-hough-transform-in-cuda-how-can-i-speed-up-the-binning-process