OpenCL and GPU global synchronization

自古美人都是妖i 提交于 2019-12-23 17:45:06

问题


Has anyone tried the gpu_sync functions described in the article "Inter-Block GPU Communication via Fast Barrier Synchronization"? All the codes described seems pretty simple and easy to implement but it keeps freezing up my GPU. I'm sure I'm doing something stupid but I can't see what. Can anyone help me?

The strategy I'm using is the one described in the section “GPU Lock-Free Synchronization” and here is the OpenCL source code I've implemented:

static void globalSync(uint iGoalValue,
                   volatile __global int *globalSyncFlagsIN,
                   volatile __global int *globalSyncFlagsOUT)
{
 const size_t iLocalThreadID  = get_local_id(0);
 const size_t iWorkGroupID    = get_group_id(0);
 const size_t iWorkGroupCount = get_num_groups(0);

 //Only the first thread on each SM is used for synchronization
 if (iLocalThreadID == 0)
 { globalSyncFlagsIN[iWorkGroupID] = iGoalValue; }

 if (iWorkGroupID == 0)
 {
  if (iLocalThreadID < iWorkGroupCount)
  {
   while (globalSyncFlagsIN[iLocalThreadID] != iGoalValue) {
    // Nothing to do here
   }
  }

  barrier(CLK_GLOBAL_MEM_FENCE);

  if (iLocalThreadID < iWorkGroupCount)
  { globalSyncFlagsOUT[iLocalThreadID] = iGoalValue; }
 }

 if (iLocalThreadID == 0)
 {
  while (globalSyncFlagsOUT[iWorkGroupID] != iGoalValue) {
   // Nothing to do here 
  }
 }

 barrier(CLK_GLOBAL_MEM_FENCE);
} 

Thanks in advance.


回答1:


I haven't tried running the code, but the direct translation from CUDA to OpenCL of the code from the article mentioned above would be:

{  
    int tid_in_blk = get_local_id(0) * get_local_size(1)
        + get_local_id(1);
    int nBlockNum = get_num_groups(0) * get_num_groups(1);
    int bid = get_group_id(0) * get_num_groups(1) + get_group_id(1);


    if (tid_in_blk == 0) {
        Arrayin[bid] = goalVal;
    }

    if (bid == 1) {
        if (tid_in_blk < nBlockNum) {
            while (Arrayin[tid_in_blk] != goalVal){

            }
        }
        barrier(CLK_LOCAL_MEM_FENCE);

        if (tid_in_blk < nBlockNum) {
            Arrayout[tid_in_blk] = goalVal;
        }
    }

    if (tid_in_blk == 0) {
        while (Arrayout[bid] != goalVal) {

        }
    }
}

Please note the difference in thread and group IDs and in using local memory barrier instead of global one.



来源:https://stackoverflow.com/questions/34476631/opencl-and-gpu-global-synchronization

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!