Difference between prefetch for read or write

雨燕双飞 提交于 2019-12-10 15:22:55

问题


The gcc docs talk about a difference between prefetch for read and prefetch for write. What is the technical difference?


回答1:


On the CPU level, a software prefetch (as opposed to ones trigger by the hardware itself) are a convenient way to hint to the CPU that a line is about to be accessed, and you want it prefetched in advance to save the latency.

If the access will be a simple read, you would want a regular prefetch, which would behave similarly to a normal load from memory (aside from not blocking the CPU in case it misses, not faulting in case the address is bad, and all sorts of other benefits, depending on the micro architecture).

However, if you intend to write to that line, and it also exists in another core, a simple read operation would not suffice. This is due to MESI-based cache handling protocols. A core must have ownership of a line before modifying it, so that it preserves coherency (if the same line gets modified in multiple cores, you will not be able to ensure correct ordering for these changes, and may even lose some of them, which is not allowed on normal WB memory types). Instead, a write operation will start by acquiring ownership of the line, and snooping it out of any other core / socket that may hold a copy. Only then can the write occur. A read operation (demand or prefetch) would have left the line in other cores in a shared state, which is good if the line is read multiple times by many cores, but doesn't help you if your core later writes to it.

To allow useful prefetching for lines that will later be written to, most CPU companies support special prefetches for writing. In x86, both Intel and AMD support the prefetchW instruction, which should have the effect of a write (i.e. - acquiring sole ownership of a line, and invalidating any other copy if it). Note that not all CPUs support that (even within the same family, not all generations have it), and not all compiler versions enable it.

Here's an example (with gcc 4.8.2) - note that you need to enable it explicitly here -

#include <emmintrin.h>

int main() {
    long long int a[100];
    __builtin_prefetch (&a[0], 0, 0);
    __builtin_prefetch (&a[16], 0, 1);
    __builtin_prefetch (&a[32], 0, 2);
    __builtin_prefetch (&a[48], 0, 3);
    __builtin_prefetch (&a[64], 1, 0);
    return 0;
}

compiled with gcc -O3 -mprfchw prefetchw.c -c , :

0000000000000000 <main>:
   0:   48 81 ec b0 02 00 00    sub    $0x2b0,%rsp
   7:   48 8d 44 24 88          lea    -0x78(%rsp),%rax
   c:   0f 18 00                prefetchnta (%rax)
   f:   0f 18 98 80 00 00 00    prefetcht2 0x80(%rax)
  16:   0f 18 90 00 01 00 00    prefetcht1 0x100(%rax)
  1d:   0f 18 88 80 01 00 00    prefetcht0 0x180(%rax)
  24:   0f 0d 88 00 02 00 00    prefetchw 0x200(%rax)
  2b:   31 c0                   xor    %eax,%eax
  2d:   48 81 c4 b0 02 00 00    add    $0x2b0,%rsp
  34:   c3                      retq

If you play with the 2nd argument you'd notice that the hint levels are ignores for prefetchW, since it doesn't support temporal level hints. By the way, if you remove the -mprfchw flag, gcc will convert this into a normal read prefetch (I haven't tried different -march/mattr settings, maybe some of them include it as well).




回答2:


The difference relates to whether you expect that memory to only be read soon, or also to be written. In the later case, the CPU may be able to optimize differently. Remember, prefetch is only a hint, so GCC may ignore it.

To quote the GCC prefetch project page:

Some data prefetch instructions make a distinction between memory which is expected to be read and memory which is expected to be written. When data is to be written, a prefetch instruction can move a block into the cache so that the expected store will be to the cache. Prefetch for write generally brings the data into the cache in an exclusive or modified state.



来源:https://stackoverflow.com/questions/30003361/difference-between-prefetch-for-read-or-write

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!