How do the Conflict Detection instructions make it easier to vectorize loops?

喜欢而已 提交于 2019-11-30 11:47:53

One example where the CD instructions might be useful is histogramming. For scalar code histogramming is just a simple loop like this:

load bin index
load bin count at index
increment bin count
store updated bin count at index

Normally you can't vectorize histogramming because you might have the same bin index more than once in a vector - you might naïvely try something like this:

load vector of N bin indices
perform gathered load using N bin indices to get N bin counts
increment N bin counts
store N updated bin counts using scattered store

but if any of the indices within a vector are the same then you get a conflict, and the resulting bin update will be incorrect.

So, CD instructions to the rescue:

load vector of N bin indices
use CD instruction to test for duplicate indices
set mask for all unique indices
while mask not empty
    perform masked gathered load using <N bin indices to get <N bin counts
    increment <N bin counts
    store <N updated bin counts using masked scattered store
    remove non-masked indices and update mask
end

In practice this example is quite inefficient and no better than scalar code, but there are other more compute-intensive examples where using the CD instructions seems to be worthwhile. Typically these will be simulations where the data elements are going to be updated in a non-deterministic fashion. One example (from the LAMMPS Molecular Dynamics Simulator) is referred to in the KNL book by Jeffers et al.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!