OpenCL select/delete points from large array

此生再无相见时 提交于 2019-12-01 21:47:21

Just an alternative for the case where most of the points fall inside the atomic condition:

It is possible to have a local counter, and local atomic. Then to merge that atomic to the global value it is possible to use atomic_add(). Witch will return the "previous" global value. So, you just copy the indexes to that address and up.

It should be a noticeable speed up, since the threads will sync locally and only once globally. The global copy can be parallel since the address will never overlap.

For example:

__kernel mykernel(__global MyType * global_out, __global int * global_count, _global MyType * global_in){
   int lid = get_local_id(0);
   int lws = get_local_size(0);
   int idx = get_global_id(0);

   __local int local_count;
   __local int global_val;    
   //I am using a local container, but a local array of pointers to global is possible as well
   __local MyType local_out[WG_SIZE]; //Ensure this is higher than your work_group size
   if(lid==0){
      local_count = 0; global_val = -1;
   }
   barrier(CLK_LOCAL_MEM_FENCE);

   //Classify them
   if(global_in[idx] == ....)
       local_out[atomic_inc(local_count)] = global_in[idx];

   barrier(CLK_LOCAL_MEM_FENCE);

   //If not, we are done
   if(local_count > 0){
      //Only the first local ID does the atomic to global
      if(lid == 0)
         global_val = atomic_add(global_count,local_count);

      //Resync all the local workers here
      barrier(CLK_LOCAL_MEM_FENCE);

      //Copy all the data
      for(int i=0; i<local_count; i+=lws)
          global_out[global_val+i] = local_out[i];
   }
}

NOTE: I didn't compile it but should more or less work.

If I understood your problem, you can do:

--> In your kernel, you can identify the points in the triangle and:

if(element[idx]!=(0,0))
      output_array[atomic_inc(number_of_elems)] = element[idx];

Finally, in first number_of_elems of output_array in the host you will have your inner points.

I hope this help you, Best

There are alternatives, all working better or worse, depending on how the data looks like. I put one below.

Deleting the identified points can also be done by registering them in a separate array per workgroup - you need to use the same atomic_inc as with Moises's answer (see my remark there about doing this at workgroup-level!!). The end-result is a list of start-points and end-points of parts that don't need to be deleted. You can then copy parts of the array those by different threads. This is less effective if you have clusters of points that need to be deleted

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!