OpenCL float sum reduction
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 由 翻译 强力驱动 问题: I would like to apply a reduce on this piece of my kernel code (1 dimensional data): __local float sum = 0 ; int i ; for ( i = 0 ; i < length ; i ++) sum += //some operation depending on i here; Instead of having just 1 thread that performs this operation, I would like to have n threads (with n = length) and at the end having 1 thread to make the total sum. In pseudo code, I would like to able to write something like this: int i = get_global_id ( 0 ); __local float sum = 0 ; sum += //some operation depending on i here; barrier (