Simplified question:
Is there a difference in timing of memory caches coherency (or \"flushing\") caused by Interlocked operations compared to Memor
Short answer: CAS (Interlocked) operations have been (and most likely will) be the quickest caches flusher.
Background: - CAS operations are supported in HW by single uninteruptable instruction. Compared to thread calling memory barrier which can be swapped right after placing the barrier but just before performing any reads/writes (so consistency guaranteed for the barrier is still met). - CAS operations are foundations for majority (if not all) high level synchronization construct (mutexes, sempahores, locks - look on their implementation and you will find CAS operations). They wouldn't likely be used if they wouldn't guarantee immediate cross-thread state consistency or if there would be other, faster mechanism(s)