Is Interlocked.Increment(ref x) faster or slower than x++ for ints and longs on various platforms?
In our experience the InterlockedIncrement() et al on Windows are quite significant impacts. In one sample case we were able to eliminate the interlock and use ++/-- instead. This alone reduced run time from 140 seconds to 110 seconds. My analysis is that the interlock forces a memory roundtrip (otherwise how could other cores see it?). An L1 cache read/write is around 10 clock cycles, but a memory read/write more like 100.
In this sample case, I estimated the number of increment/decrement operations at about 1 billion. So on a 2Ghz CPU this is something like 5 seconds for the ++/--, and 50 seconds for the interlock. Spread the difference across several threads, and its close to 30 seconds.