Should CUDA Constant Memory be accessed warp-uniformly?

问题

My CUDA application has constant memory of less than 8KB. Since it will all be cached, do I need to worry about every thread accessing the same address for optimization?

If yes, how do I assure all threads are accessing the same address at the same time?

回答1:

Since it will all be cached, do I need to worry about every thread accessing the same address for optimization?

Yes. The cache itself can only serve up one 32-bit word per cycle.

If yes, how do I assure all threads are accessing the same address at the same time?

Ensure that whatever kind of indexing or addressing you use to reference an element in the constant memory area does not depend on any of the built in thread variables, e.g. threadIdx.x, threadIdx.y, or threadIdx.z. Note that the actual requirement is less stringent than this. You can achieve the necessary goal as long as the indexing evaluates to the same number for every thread in a given warp. Here are a few examples:

__constant__ int data[1024];
...
// assume 1D threadblock
int idx = threadIdx.x;
int bidx = blockIdx.x;
int a = data[idx];      // bad - every thread accesses a different element
int b = data[12];       // ok  - every thread accesses the same element
int c = data[b];        // ok  - b is a constant w.r.t threads
int d = data[b + idx];  // bad
int e = data[b + bidx]; // ok
int f = data[idx/32];   // ok - the same element is being accessed per warp

来源：https://stackoverflow.com/questions/27070542/should-cuda-constant-memory-be-accessed-warp-uniformly

标签

optimization

memory-management

cuda

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!