The other week, I wrote a little thread class and a one-way message pipe to allow communication between threads (two pipes per thread, obviously, for bidirectional communica
Herb Sutter seemed to simply suggest that any two variables should reside on separate cache lines. He does this in his concurrent queue with padding between his locks and node pointers.
Edit: If you're using the Intel compiler or GCC, you can use the atomic builtins, which seem to do their best to preempt the cache when possible.