Let\'s imagine that I have a few worker threads such as follows:
while (1) {
do_something();
if (flag_isset())
do_something_else();
}
The 'minimum amount of work' is an explicit memory barrier. The syntax depends on your compiler; on GCC you could do:
void flag_set() {
global_flag = 1;
__sync_synchronize(global_flag);
}
void flag_clear() {
global_flag = 0;
__sync_synchronize(global_flag);
}
int flag_isset() {
int val;
// Prevent the read from migrating backwards
__sync_synchronize(global_flag);
val = global_flag;
// and prevent it from being propagated forwards as well
__sync_synchronize(global_flag);
return val;
}
These memory barriers accomplish two important goals:
They force a compiler flush. Consider a loop like the following:
for (int i = 0; i < 1000000000; i++) {
flag_set(); // assume this is inlined
local_counter += i;
}
Without a barrier, a compiler might choose to optimize this to:
for (int i = 0; i < 1000000000; i++) {
local_counter += i;
}
flag_set();
Inserting a barrier forces the compiler to write the variable back immediately.
They force the CPU to order its writes and reads. This is not so much an issue with a single flag - most CPU architectures will eventually see a flag that's set without CPU-level barriers. However the order might change. If we have two flags, and on thread A:
// start with only flag A set
flag_set_B();
flag_clear_A();
And on thread B:
a = flag_isset_A();
b = flag_isset_B();
assert(a || b); // can be false!
Some CPU architectures allow these writes to be reordered; you may see both flags being false (ie, the flag A write got moved first). This can be a problem if a flag protects, say, a pointer being valid. Memory barriers force an ordering on writes to protect against these problems.
Note also that on some CPUs, it's possible to use 'acquire-release' barrier semantics to further reduce overhead. Such a distinction does not exist on x86, however, and would require inline assembly on GCC.
A good overview of what memory barriers are and why they are needed can be found in the Linux kernel documentation directory. Finally, note that this code is enough for a single flag, but if you want to synchronize against any other values as well, you must tread very carefully. A lock is usually the simplest way to do things.