Volatile fence demo?

后端 未结 2 1125
有刺的猬
有刺的猬 2020-12-09 22:15

Im trying to see how the fence is applied.

I have this code (which Blocks indefinitely):

static void Main()
{
    bool complete = false;
          


        
2条回答
  •  感动是毒
    2020-12-09 22:56

    Like most of my answers pertaining to memory barriers I will use an arrow notation where ↓ represents an acquire-fence (volatile read) and ↑ represents a release-fence (volatile write). Remember, no other read or write can move past an arrow head (though they can move past the tail).

    Let us first analyze the writing thread. I will assume that complete is declared as volatile1. Thread.Start, Thread.Sleep, and Thread.Join will generate full fences and that is why I have up and down arrows on either side of each of those calls.

    ↑                   // full fence from Thread.Start
    t.Start();
    ↓                   // full fence from Thread.Start
    ↑                   // full fence from Thread.Sleep
    Thread.Sleep(1000);
    ↓                   // full fence from Thread.Sleep
    ↑                   // release fence from volatile write to complete
    complete = true;
    ↑                   // full fence from Thread.Join
    t.Join();
    ↓                   // full fence from Thread.Join
    

    One important thing to notice here is that it is the Thread.Join call that is preventing the write to complete from floating any further down. The effect here is that the write gets committed to main memory immediately. It is not the volatility of complete itself that is causing it to get flushed to main memory. It is the Thread.Join call and the memory barrier it generates that is causing that behavior.

    Now we will analyze the reading thread. This is a bit trickier to visualize because of the while loop though, but let us start with this.

    bool toggle = false;
    register1 = complete;
    ↓                           // half fence from volatile read
    while (!register1)
    {
      bool register2 = toggle;
      register2 = !register2;
      toggle = register2;
      register1 = complete;
      ↓                         // half fence from volatile read
    }
    

    Maybe we can visualize it better if we unwind the loop. For brevity I will only show the first 4 iterations.

    if (!register1) return;
    register2 = toggle;
    register2 = !register2;
    toggle = register2;
    register1 = complete;
    ↓
    if (!register1) return;
    register2 = toggle;
    register2 = !register2;
    toggle = register2;
    register1 = complete;
    ↓
    if (!register1) return;
    register2 = toggle;
    register2 = !register2;
    toggle = register2;
    register1 = complete;
    ↓
    if (!register1) return;
    register2 = toggle;
    register2 = !register2;
    toggle = register2;
    register1 = complete;
    ↓
    

    Now that we have the loop unwound I think you can see how that any potential movement of the read of complete is going to be severely limited.2 Yes, it can get shuffled around a little bit by the compiler or hardware, but it is pretty much locked into being read on every iteration. Remember, the read of complete is still free to move, but the fence that it created does not move with it. That fence is locked into place. This is what causes the behavior often called a "fresh read". If volatile were omitted on complete then the compiler would be free to use an optimization technique called "lifting". That is where a read of a memory address can get extracted or lifted outside the loop. In the absence of volatile that optimization would be legal because all of the reads of complete would be allowed to float up (or lifted) until they are all ultimately outside of the loop. At that point the compiler would then coalesce them all into a one-time read just before starting the loop.3

    Let me summarize a few important points right now.

    • It is the call to Thread.Join that is causing the write to complete to get committed to main memory so that the worker thread will eventually pick it up. The volatility of complete is irrelevant on the writing thread (which is probably surprising to most).
    • It is the acquire-fence generated by the volatile read of complete that is preventing that read from getting lifted outside of the loop which in turn creates the "fresh read" behavior. The volatility of complete on the reading thread makes a huge difference (which is probably obvious to most).
    • "Committed writes" and "fresh reads" are not directly caused volatile reads and writes. But, they are indirect consequences which just happen to almost always occur especially in the case of loops.

    1Marking complete as volatile on the writing thread is not necessary because x86 writes already have volatile semantics, but more importantly because the fence that is created by it does not cause the "committed write" behavior anyway.

    2Keep in mind, that reads and writes can move through the tail of arrow, but the arrow is locked in place. That is why you cannot bubble up all of the reads outside of the loop.

    3The lifting optimization must also ensure that the actual behavior of the thread is consistent with what the programmer originally intended. That requirement is easy to satisfy in this case because the compiler can easily see that complete is never written to on that thread.

提交回复
热议问题