Raymond Chen has been doing a huge series on lockfree algorithms. Beyond the simple cases of the InterlockedXxx functions, it seems like the prevailing pattern
Recently on JavaOne Russia Oracle employee (who specializes on Java performance and benchmarks) have showed some measurements about operations per second within parallel access to simple int counter, using CAS (lock-free, high-level spinlock in fact) and classic locks (java.util.concurrent.locks.ReentrantLock)
http://dl.dropbox.com/u/19116634/pics/lock-free-vs-locks.png //sorry, i can't paste images
According to this, spin-locks have better performance only until the few number of threads tries to access monitor.