This article: http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf (page 12) seems to make a difference between a lock and a memory barrier
I would like to know w
A memory barrier (also known as a fence) is a hardware operation, which
ensures the ordering of different reads and writes to the globally
visible store. On a typical modern processor, memory accesses are
pipelined, and may occur out of order. A memory barrier ensures that
this doesn't happen. A full memory barrier will ensure that all loads
and stores which precede it occur before any load or store which follows
it. (Many processors have support partial barriers; e.g. on a Sparc, a
membar #StoreStore ensures that all stores which occur before it will
be visible to all other processes before any store which occurs after
it.)
That's all a memory barrier does. It doesn't block the thread, or anything.
Mutexes and semaphores are higher level primatives, implemented in the operating system. A thread which requests a mutex lock will block, and have its execution suspended by the OS, until that mutex is free. The kernel code in the OS will contain memory barrier instructions in order to implement a mutex, but it does much more; a memory barrier instruction will suspend the hardware execution (all threads) until the necessary conditions have been met—a microsecond or so at the most, and the entire processor stops for this time. When you try to lock a mutex, and another thread already has it, the OS will suspend your thread (and only your thread—the processor continues to execute other threads) until whoever holds the mutex frees it, which could be seconds, minutes or even days. (Of course, if it's more than a few hundred milliseconds, it's probably a bug.)
Finally, there's not really much difference between semaphores and mutexes; a mutex can be considered a semaphore with a count of one.