This is a bit of a two part question, all about the atomicity of std::shared_ptr:
1.
As far as I can tell, std::shared_ptr
I am preparing a talk on shared_ptr at work. I have been using a modified boost shared_ptr with avoid separate malloc (like what make_shared can do) and a template param for lock policy like shared_ptr_unsynchronized mentioned above. I am using the program from
http://flyingfrogblog.blogspot.hk/2011/01/boosts-sharedptr-up-to-10-slower-than.html
as a test, after cleaning up the unnecessary shared_ptr copies. The program uses the main thread only and the test argument is shown. The test env is a notebook running linuxmint 14. Here is the time taken in seconds:
test run setup boost(1.49) std with make_shared modified boost mt-unsafe(11) 11.9 9/11.5(-pthread on) 8.4 atomic(11) 13.6 12.4 13.0 mt-unsafe(12) 113.5 85.8/108.9(-pthread on) 81.5 atomic(12) 126.0 109.1 123.6
Only the 'std' version uses -std=cxx11, and the -pthread likely switches lock_policy in g++ __shared_ptr class.
From these numbers, I see the impact of atomic instructions on code optimization. The test case does not use any C++ containers, but vector is likely to suffer if the object doesn't need the thread protection. Boost suffers less probably because the additional malloc is limiting the amount of inlining and code optimizaton.
I have yet to find a machine with enough cores to stress test the scalability of atomic instructions, but using std::shared_ptr only when necessary is probably better.