In a codebase I reviewed, I found the following idiom.
void notify(struct actor_t act) {
write(act.pipe, \"M\", 1);
}
// thread A sending data to thread
Memory barriers aren't just to prevent instruction reordering. Even if instructions aren't reordered it can still cause problems with cache coherence. As for the reordering - it depends on your compiler and settings. ICC is particularly agressive with reordering. MSVC w/ whole program optimization can be, too.
If your shared data variable is declared as volatile, even though it's not in the spec most compilers will generate a memory variable around reads and writes from the variable and prevent reordering. This is not the correct way of using volatile, nor what it was meant for.
(If I had any votes left, I'd +1 your question for the narration.)