Can anyone provide a good explanation of the volatile keyword in C#? Which problems does it solve and which it doesn\'t? In which cases will it save me the use of locking?>
If you want to get slightly more technical about what the volatile keyword does, consider the following program (I'm using DevStudio 2005):
#include
void main()
{
int j = 0;
for (int i = 0 ; i < 100 ; ++i)
{
j += i;
}
for (volatile int i = 0 ; i < 100 ; ++i)
{
j += i;
}
std::cout << j;
}
Using the standard optimised (release) compiler settings, the compiler creates the following assembler (IA32):
void main()
{
00401000 push ecx
int j = 0;
00401001 xor ecx,ecx
for (int i = 0 ; i < 100 ; ++i)
00401003 xor eax,eax
00401005 mov edx,1
0040100A lea ebx,[ebx]
{
j += i;
00401010 add ecx,eax
00401012 add eax,edx
00401014 cmp eax,64h
00401017 jl main+10h (401010h)
}
for (volatile int i = 0 ; i < 100 ; ++i)
00401019 mov dword ptr [esp],0
00401020 mov eax,dword ptr [esp]
00401023 cmp eax,64h
00401026 jge main+3Eh (40103Eh)
00401028 jmp main+30h (401030h)
0040102A lea ebx,[ebx]
{
j += i;
00401030 add ecx,dword ptr [esp]
00401033 add dword ptr [esp],edx
00401036 mov eax,dword ptr [esp]
00401039 cmp eax,64h
0040103C jl main+30h (401030h)
}
std::cout << j;
0040103E push ecx
0040103F mov ecx,dword ptr [__imp_std::cout (40203Ch)]
00401045 call dword ptr [__imp_std::basic_ostream >::operator<< (402038h)]
}
0040104B xor eax,eax
0040104D pop ecx
0040104E ret
Looking at the output, the compiler has decided to use the ecx register to store the value of the j variable. For the non-volatile loop (the first) the compiler has assigned i to the eax register. Fairly straightforward. There are a couple of interesting bits though - the lea ebx,[ebx] instruction is effectively a multibyte nop instruction so that the loop jumps to a 16 byte aligned memory address. The other is the use of edx to increment the loop counter instead of using an inc eax instruction. The add reg,reg instruction has lower latency on a few IA32 cores compared to the inc reg instruction, but never has higher latency.
Now for the loop with the volatile loop counter. The counter is stored at [esp] and the volatile keyword tells the compiler the value should always be read from/written to memory and never assigned to a register. The compiler even goes so far as to not do a load/increment/store as three distinct steps (load eax, inc eax, save eax) when updating the counter value, instead the memory is directly modified in a single instruction (an add mem,reg). The way the code has been created ensures the value of the loop counter is always up-to-date within the context of a single CPU core. No operation on the data can result in corruption or data loss (hence not using the load/inc/store since the value can change during the inc thus being lost on the store). Since interrupts can only be serviced once the current instruction has completed, the data can never be corrupted, even with unaligned memory.
Once you introduce a second CPU to the system, the volatile keyword won't guard against the data being updated by another CPU at the same time. In the above example, you would need the data to be unaligned to get a potential corruption. The volatile keyword won't prevent potential corruption if the data cannot be handled atomically, for example, if the loop counter was of type long long (64 bits) then it would require two 32 bit operations to update the value, in the middle of which an interrupt can occur and change the data.
So, the volatile keyword is only good for aligned data which is less than or equal to the size of the native registers such that operations are always atomic.
The volatile keyword was conceived to be used with IO operations where the IO would be constantly changing but had a constant address, such as a memory mapped UART device, and the compiler shouldn't keep reusing the first value read from the address.
If you're handling large data or have multiple CPUs then you'll need a higher level (OS) locking system to handle the data access properly.