I understand that volatile informs the compiler that the value may be changed, but in order to accomplish this functionality, does the compiler need to introduc
It doesn't have to. Volatile is not a synchronization primitive. It just disables optimisations, i.e. you get a predictable sequence of reads and writes within a thread in the same order as prescribed by the abstract machine. But reads and writes in different threads have no order in the first place, it makes no sense to speak of preserving or not preserving their order. The order between theads can be established by synchronization primitives, you get UB without them.
A bit of explanation regarding memory barriers. A typical CPU has several levels of memory access. There is a memory pipeline, several levels of cache, then RAM etc.
Membar instructions flush the pipeline. They don't change the order in which reads and writes are executed, it just forces outstanding ones to be executed at a given moment. It is useful for multithreaded programs, but not much otherwise.
Cache(s) are normally automatically coherent between CPUs. If one wants to make sure the cache is in sync with RAM, cache flush is needed. It is very different from a membar.