I understand that volatile informs the compiler that the value may be changed, but in order to accomplish this functionality, does the compiler need to introduc
I always use volatile in interrupt service routines, e.g. the ISR (often assembly code) modifies some memory location and the higher level code that runs outside of the interrupt context accesses the memory location through a pointer to volatile.
I do this for RAM as well as memory-mapped IO.
Based on the discussion here it seems this is still a valid use of volatile but doesn't have anything to do with multiple threads or CPUs. If the compiler for a microcontroller "knows" that there can't be any other accesses (e.g. everyting is on-chip, no cache and there's only one core) I would think that a memory fence isn't implied at all, the compiler just needs to prevent certain optimisations.
As we pile more stuff into the "system" that executes the object code almost all bets are off, at least that's how I read this discussion. How could a compiler ever cover all bases?