I understand that volatile informs the compiler that the value may be changed, but in order to accomplish this functionality, does the compiler need to introduc
What David is overlooking is the fact that the C++ standard specifies the behavior of several threads interacting only in specific situations and everything else results in undefined behavior. A race condition involving at least one write is undefined if you don't use atomic variables.
Consequently, the compiler is perfectly in its right to forego any synchronization instructions since your CPU will only notice the difference in a program that exhibits undefined behavior due to missing synchronization.