CUDA Stream compaction: understanding the concept

问题

I am using CUDA/Thrust/CUDPP. As I understand, in Stream compaction, certain items in an array are marked as invalid and then "removed".

Now what does "removal" really mean here? Suppose the original array A and has length 6. If 2 elements are invalid (by whatever condition we may provide) then

Does the system create a new array of size 4 in GPU-memory to store the valid elements to get the final result?
OR does it physically remove the invalid elements from memory and shrink the original array A down to size 4 keeping only the valid elements?

For either case, doesn't that mean that dynamic memory allocation is happening under the hood? But I had heard that dynamic memory allocation is not possible in the CUDA world.

回答1:

First, dynamic memory allocation is possible in CUDA on Compute Capability 2.0 and higher devices. The CUDA runtime library supports malloc/free and new/delete in __device__ functions. But that is not germane to the answer, really.

Typically a large-enough output array is provided (pre-allocated, often the same size as the input array) and the output is written to it. No dynamic allocation required, but there is potentially storage waste. This is what CUDPP and thrust do. An alternative would be to perform a count of valid elements first, then allocate the output GPU memory dynamically using cudaMalloc called from the host CPU.

来源：https://stackoverflow.com/questions/8388125/cuda-stream-compaction-understanding-the-concept

标签

algorithm

cuda

gpu

thrust

cudpp

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!