Consider the following condensed code:
/* Compile: gcc -pthread -m32 -ansi x.c */
#include
#include <
The reading of the variable in 0x804855a and 0x804855f does not need to be atomic. Using the compare-and-swap instruction to increment looks like this in pseudocode:
oldValue = *dest; // non-atomic: tearing between the halves is unlikely but possible
do {
newValue = oldValue+1;
} while (!compare_and_swap(dest, &oldValue, newValue));
Since the compare-and-swap checks that *dest == oldValue before swapping, it will act as a safeguard - so that if the value in oldValue is incorrect, the loop will be tried again, so there's no problem if the non-atomic read resulted in an incorrect value.
The 64-bit access to *dest done by lock cmpxchg8b is atomic (as part of an atomic RMW of *dest). Any tearing in loading the 2 halves separately will be caught here. Or if a write from another core happened after the initial read, before lock cmpxchg8b: this is possible even with single-register-width cmpxchg-retry loops. (e.g. to implement atomic fetch_mul or an atomic float, or other RMW operations that x86's lock prefix doesn't let us do directly.)
Your second question was why the line oldValue = *dest is not inside the loop. This is because the compare_and_swap function will always replace the value of oldValue with the actual value of *dest. So it will essentially perform the line oldValue = *dest for you, and there's no point in doing it again. In the case of the cmpxchg8b instruction, it will put the contents of the memory operand in edx:eax when the comparison fails.
The pseudocode for compare_and_swap is:
bool compare_and_swap (int *dest, int *oldVal, int newVal)
{
do atomically {
if ( *oldVal == *dest ) {
*dest = newVal;
return true;
} else {
*oldVal = *dest;
return false;
}
}
}
By the way, in your code you need to ensure that v is aligned to 64 bits - otherwise it could be split between two cache lines and the cmpxchg8b instruction will not be performed atomically. You can use GCC's __attribute__((aligned(8))) for this.