Efficiently Initializing Shared Memory Array in CUDA

前端 未结 1 743
借酒劲吻你
借酒劲吻你 2020-12-22 04:10

Note that this shared memory array is never written to, only read from.

As I have it, my shared memory gets initialized like:

__shared__ float TMshar         


        
相关标签:
1条回答
  • 2020-12-22 04:57

    Use all threads to write independent locations, it will probably be quicker.

    Example assumes 1D threadblock/grid:

    #define SSIZE 2592
    
    __shared__ float TMshared[SSIZE]; 
    
      int lidx = threadIdx.x;
      while (lidx < SSIZE){
        TMShared[lidx] = TM[lidx];
        lidx += blockDim.x;}
    
    __syncthreads();
    
    0 讨论(0)
提交回复
热议问题