There is a popular spin-lock mutex version which is spreaded across the Internet and which one might encounter in the Anthony Williams book(C++ Concurrency in Action). Here
As you said, test_and_set is a RMW operation. However, for testing it is only important that the correct value is read. Thus, memory_order_acquire seems sufficient.
See also table Constants in http://en.cppreference.com/w/cpp/atomic/memory_order