you should be okay. It's volatile, so the optimizer shouldn't savage you, and it's a 32-bit value so it should be at least approximately atomic. The one possible surprise is if the instruction pipeline can get around that.
On the other hand, what's the additional cost of using the guarded routines?