When program will benefit from prefetch & non-temporal load/store?

前端 未结 2 1656
被撕碎了的回忆
被撕碎了的回忆 2021-02-08 03:24

I did a test with this

    for (i32 i = 0; i < 0x800000; ++i)
    {
        // Hopefully this can disable hardware prefetch
        i32 k = (i * 997 & 0x7         


        
2条回答
  •  刺人心
    刺人心 (楼主)
    2021-02-08 04:27

    If your computation chain is very short and if you're reading memory sequentially then the CPU will prefetch well on its own and actually work faster since its decoder has less work to do.

    Streaming loads and stores are good only if you don't plan to access this memory in the near future. They are mainly aimed at uncached write back (WB) memory that's usually found when dealing with graphic surfaces. Explicit prefecthing may work well on one architecture (CPU model) and have a negative effect on other models so use them as a last resort option when optimizing.

提交回复
热议问题