Unexpectedly poor and weirdly bimodal performance for store loop on Intel Skylake

前端未结

关注

 2  381

星月不相逢 2020-11-29 06:15

I\'m seeing unexpectedly poor performance for a simple store loop which has two stores: one with a forward stride of 16 byte and one that\'s always to the same location

2条回答

臣服心动 (楼主)

2020-11-29 06:41

Sandy Bridge has "L1 data hardware pre-fetchers". What this means is that initially when you do your store the CPU has to fetch data from L2 into L1; but after this has happened several times the hardware pre-fetcher notices the nice sequential pattern and starts pre-fetching data from L2 into L1 for you, so that the data is either in L1 or "half way to L1" before your code does its store.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...