Is mask adaptive in __shfl_up_sync call?
问题 Basically, it is a materialized version of this post. Suppose a warp need to process 4 objects(say, pixels in image), each 8 lanes are grouped together to process one object: Now I need do internal shuffle operations during processing one object(i.e. among 8 lanes of this object), it worked for each object just setting mask as 0xff : uint32_t mask = 0xff; __shfl_up_sync(mask,val,1); However, to my understanding, set mask as 0xff will force the lane0:lane7 of object0(or object3? also stuck on