问题
There are 510 keys for sort. BLOCK_DIM_X = 128, ITEMS_PER_THREAD = 4, thus every tile covers 512 keys. We lauch kenel by 1 block.
my kernel looks like this:
typedef cub::BlockRadixSort<int, 128, 4> BlockRadixSort;
int thread_data[4];
BlockLoad(temp_storage.load).Load(in_data, thread_data);
CTA_SYNC();
BlockRadixSort(temp_storage.sort).Sort(thread_data);
CTA_SYNC();
BlockStore(temp_storage.store).Store(out_data, thread_data);
CTA_SYNC();
The problem is BlockRadixSort sort 512 keys, not 510. How to exclude the last 2 items from block sort?
来源:https://stackoverflow.com/questions/62170084/cubblockradixsort-how-to-deal-with-the-last-tile-which-is-not-full