I often read that unique_ptr would be preferred in most situations over shared_ptr because unique_ptr is non-copyable and has move semantics; shared_ptr would add an overhead
UPDATED on Jan 01, 2014
I know this question is pretty old, but the results are still valid on G++ 4.7.0 and libstdc++ 4.7. So, I tried to find out the reason.
What you're benchmarking here is the dereferencing performance using -O0 and, looking at the implementation of unique_ptr and shared_ptr, your results are actually correct.
unique_ptr stores the pointer and the deleter in a ::std::tuple, while shared_ptr stores a naked pointer handle directly. So, when you dereference the pointer (using *, ->, or get) you have an extra call to ::std::get<0>() in unique_ptr. In contrast, shared_ptr directly returns the pointer. On gcc-4.7 even when optimized and inlined, ::std::get<0>() is a bit slower than the direct pointer.. When optimized and inlined, gcc-4.8.1 fully omits the overhead of ::std::get<0>(). On my machine, when compiled with -O3, the compiler generates exactly the same assembly code, which means they are literally the same.
All in all, using the current implementation, shared_ptr is slower on creation, moving, copying and reference counting, but equally as fast *on dereferencing*.
NOTE: print() is empty in the question and the compiler omits the loops when optimized. So, I slightly changed the code to correctly observe the optimization results:
#include
#include
#include
#include
#include
using namespace std;
class Print {
public:
void print() { i++; }
int i{ 0 };
};
void test() {
typedef vector> sh_vec;
typedef vector> u_vec;
sh_vec shvec;
u_vec uvec;
// can't use initializer_list with unique_ptr
for (int var = 0; var < 100; ++var) {
shvec.push_back(make_shared());
uvec.emplace_back(new Print());
}
//-------------test shared_ptr-------------------------
auto time_sh_1 = std::chrono::system_clock::now();
for (auto var = 0; var < 1000; ++var) {
for (auto it = shvec.begin(), end = shvec.end(); it != end; ++it) {
(*it)->print();
}
}
auto time_sh_2 = std::chrono::system_clock::now();
cout << "test shared_ptr : " << (time_sh_2 - time_sh_1).count()
<< " microseconds." << endl;
//-------------test unique_ptr-------------------------
auto time_u_1 = std::chrono::system_clock::now();
for (auto var = 0; var < 1000; ++var) {
for (auto it = uvec.begin(), end = uvec.end(); it != end; ++it) {
(*it)->print();
}
}
auto time_u_2 = std::chrono::system_clock::now();
cout << "test unique_ptr : " << (time_u_2 - time_u_1).count()
<< " microseconds." << endl;
}
int main() { test(); }
NOTE: That is not a fundamental problem and can be easily fixed by discarding the use of ::std::tuple in current libstdc++ implementation.