I\'m reading Joe Duffy\'s post about Volatile reads and writes, and timeliness, and i\'m trying to understand something about the last code sample in the post:
<
There seems to be some comparison with the Win32 API functions by the same name, but this thread is all about the C# Interlocked
class. From its very description, it is guaranteed that its operations are atomic. I'm not sure how that translates to "full memory barriers" as mentioned in other answers here, but judge for yourself.
On uniprocessor systems, nothing special happens, there's just a single instruction:
FASTCALL_FUNC CompareExchangeUP,12
_ASSERT_ALIGNED_4_X86 ecx
mov eax, [esp+4] ; Comparand
cmpxchg [ecx], edx
retn 4 ; result in EAX
FASTCALL_ENDFUNC CompareExchangeUP
But on multiprocessor systems, a hardware lock is used to prevent other cores to access the data at the same time:
FASTCALL_FUNC CompareExchangeMP,12
_ASSERT_ALIGNED_4_X86 ecx
mov eax, [esp+4] ; Comparand
lock cmpxchg [ecx], edx
retn 4 ; result in EAX
FASTCALL_ENDFUNC CompareExchangeMP
An interesting read with here and there some wrong conclusions, but all-in-all excellent on the subject is this blog post on CompareExchange.
As often, the answer is, "it depends". It appears that prior to 2.1, the ARM had a half-barrier. For the 2.1 release, this behavior was changed to a full barrier for the Interlocked
operations.
The current code can be found here and actual implementation of CompareExchange here. Discussions on the generated ARM assembly, as well as examples on generated code can be seen in the aforementioned PR.