Simplified question:
Is there a difference in timing of memory caches coherency (or \"flushing\") caused by Interlocked operations compared to Memor
To understand C# interlocked operations, you need to understand Win32 interlocked operations.
The "pure" interlocked operations themselves only affect the freshness of the data directly referenced by the operation.
But in Win32, interlocked operations used to imply full memory barrier. I believe this is mostly to avoid breaking old programs on newer hardware. So InterlockedAdd does two things: interlocked add (very cheap, does not affect caches) and full memory barrier (rather heavy op).
Later, Microsoft realized this is expensive, and added versions of each operation that does no or partial memory barrier.
So there are now (in Win32 world) four versions of almost everything: e.g. InterlockedAdd (full fence), InterlockedAddAcquire (read fence), InterlockedAddRelease (write fence), pure InterlockedAddNoFence (no fence).
In C# world, there is only one version, and it matches the "classic" InterlockedAdd - that also does the full memory fence.