I wrote some lock-free code that works fine with local reads, under most conditions.
Does local spinning on a memory read necessarily imply I have to ALWAYS insert a
I may well not properly have understood the question, but...
If you're spinning, one problem is the compiler optimizing your spin away. Volatile solves this.
The memory barrier, if you have one, will be issued by the writer to the spin lock, not the reader. The writer doesn't actually have to use one - doing so ensures the write is pushed out immediately, but it'll go out pretty soon anyway.
The barrier prevents for a thread executing that code re-ordering across it's location, which is its other cost.