I\'ve been reading about the new C++11 memory model and I\'ve come upon the std::kill_dependency function (§29.3/14-15). I\'m struggling to understand why
The purpose of memory_order_consume is to ensure the compiler does not do certain unfortunate optimizations that may break lockless algorithms. For example, consider this code:
int t;
volatile int a, b;
t = *x;
a = t;
b = t;
A conforming compiler may transform this into:
a = *x;
b = *x;
Thus, a may not equal b. It may also do:
t2 = *x;
// use t2 somewhere
// later
t = *x;
a = t2;
b = t;
By using load(memory_order_consume), we require that uses of the value being loaded not be moved prior to the point of use. In other words,
t = x.load(memory_order_consume);
a = t;
b = t;
assert(a == b); // always true
The standard document considers a case where you may only be interested in ordering certain fields of a structure. The example is:
r1 = x.load(memory_order_consume);
r2 = r1->index;
do_something_with(a[std::kill_dependency(r2)]);
This instructs the compiler that it is allowed to, effectively, do this:
predicted_r2 = x->index; // unordered load
r1 = x; // ordered load
r2 = r1->index;
do_something_with(a[predicted_r2]); // may be faster than waiting for r2's value to be available
Or even this:
predicted_r2 = x->index; // unordered load
predicted_a = a[predicted_r2]; // get the CPU loading it early on
r1 = x; // ordered load
r2 = r1->index; // ordered load
do_something_with(predicted_a);
If the compiler knows that do_something_with won't change the result of the loads for r1 or r2, then it can even hoist it all the way up:
do_something_with(a[x->index]); // completely unordered
r1 = x; // ordered
r2 = r1->index; // ordered
This allows the compiler a little more freedom in its optimization.