Can volatile but unfenced reads yield indefinitely stale values? (on real hardware)

好久不见. 提交于 2019-12-02 18:56:23

C++11 atomics deal with three issues:

  1. ensuring that a complete value is read or written without a thread switch; this prevents tearing.

  2. ensuring that the compiler does not re-order instructions within a thread across an atomic read or write; this ensures ordering within the thread.

  3. ensuring (for appropriate choices of memory order parameters) that data written within a thread prior to an atomic write will be seen by a thread that reads the atomic variable and sees the value that was written. This is visibility.

When you use memory_order_relaxed you don't get a guarantee of visibility from the relaxed store or load. You do get the first two guarantees.

Implementations "should" (i.e. are encouraged to) make memory writes visible within a reasonable amount of time, even with relaxed ordering. That's about the best that can be said; sooner or later these things should show up.

So, yes, formally, an implementation that never made relaxed writes visible to relaxed reads conforms to the language definition. In practice, this won't happen.

As to what volatile does, ask your compiler vendor. It's up to the implementation.

It is technically legal for std::memory_order_relaxed loads to never, ever return a new value for the load. As for whether any implementation will do this, I have no clue.

Reference: http://www.developerfusion.com/article/138018/memory-ordering-for-atomic-operations-in-c0x/ "The only requirement is that accesses to a single atomic variable from the same thread can’t be reordered: once a given thread has seen a particular value of an atomic variable, a subsequent read by that thread can’t retrieve an earlier value of the variable."

If the processors don't have cache-coherence protocol or have it very simple then it can 'optimize' the loads fetching stale data from cache. Now most modern multi-core CPUs implement cache coherency protocol. However the ARM before A9 did not have it. Non-CPU architectures also might not have cache-coherency (although they would arguably don't adhere to C++ memory model).

Other issue is that many architecture (including ARM and x86) allow reordering memory access. I don't know whether the processors are smart enough to notice repeated accesses to same address but I doubt it (it costs space and time for rare case, as compiler should be able to notice it, with small benefits, as later accesses will likely be L1 hits) but technically it can speculate that branch will be taken and it can reorder second access before first one (unlikely but if I read Intel and ARM manual correctly this is allowed).

Finally there are external devices which does not adhere to cache-coherency. If CPU communicates by memory mapped IO/DMA then the page must be marked as non-cachable (otherwise in L1/L2/L3/... cache would be staled data). In such cases processor will not usually reorder read and writes (for details consult your processor manual - it might have more fine-grained control) - compiler can so you need to use volatile. However as atomics are usually cache-based you don't need or can use them.

I am afraid that I cannot answer if such strong cache coherency will be available in future processors. I would suggest strictly following the specification ("What's wrong in storing pointer in int? Surely noone will ever user more then 4GiB so 32b address is big enough."). The correctness was answered by others so I won't include it.

didierc

Here's my take on it, though I don't have much knowledge on the topic, so take it with a grain of salt.

The volatile keyword effect might well be compiler dependent, but I will assume that it actually does what intuitively we expect from it, namely avoid aliasing or any other optimization which would not let a user inspect the value of the variable in a debugger at any point of execution during the life of that variable. That's pretty close to (and probably the same as) that answer on the meaning of volatile.

The direct implication is that any code block accessing the volatile variable v will have to commit it to memory as soon as it has modified it. Fences will make it so that it happens in order with other updates, but either way, there will be a store on v in the assembly output if v is modified at the source level.

Indeed, the question you ask is, if v, loaded in a register, has not been modified by some computation, what forces the CPU to execute a read from v again to any register, as opposed to simply reuse the value it has already got earlier.

I think the answer is that the CPU cannot assume that a memory cell hasn't changed from its last read. Memory access, even on a single core system, isn't strictly reserved to the CPU. Many other subsystems can access it read-write (that's the principle behind DMA).

The safest optimization that a CPU can probably do is check whether the value changed in cache or not, and use that as a hint of the state of v in memory. Caches should be kept in sync. with memory thanks to cache invalidation mechanisms attached with DMA. With that condition, the problem reverts to cache coherency on multicore, and "write after write" for multithreading situations. That last problem cannot be handled effectively with simple volatile variables, since their modification operation is not atomic, as you already know.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!