shared_ptr: horrible speed

早过忘川 提交于 2019-11-27 10:33:52

shared_ptr are the most complicated type of pointer ever:

  • Ref counting takes time
  • Multiple allocation (there are 3 parts: the object, the counter, the deleter)
  • A number of virtual methods (in the counter and the deleter) for type erasure
  • Works among multiple threads (thus synchronization)

There are 2 ways to make them faster:

  • use make_shared to allocate them, because (unfortunately) the normal constructor allocates two different blocks: one for the object and one for the counter and deleter.
  • don't copy them if you don't need to: methods should accept shared_ptr<T> const&

But there are also many ways NOT to use them.

Looking at your code it looks like your doing a LOT of memory allocation, and I can't help but wonder if you couldn't find a better strategy. I must admit I didn't got the full figure, so I may be heading straight into a wall but...

Usually code is much simpler if you have an owner for each of the objects. Therefore, shared_ptr should be a last resort measure, employed when you can't get a single owner.

Anyway, we're comparing apples and oranges here, the original code is buggy. You take care of deleting the memory (good) but you forgot that these objects were also referenced from other points in the program e1->setNextEdge(e21) which now holds pointers to destructed objects (in a free'd memory zone). Therefore I guess that in case of exception you just wipe out the entire list ? (Or somehow bet on undefined behavior to play nice)

So it's hard to judge on performances since the former doesn't recover well from exceptions while the latter does.

Finally: Have you thought about using intrusive_ptr ? It could give you some boost (hehe) if you don't synchronize them (single thread) and you would avoid a lot of stuff performed by the shared_ptr as well as gain on locality of reference.

I always recommend using std::shared_ptr<> instead of relying on manual memory life-time management. However, automatic lifetime management costs something but usually not significant.

In your case you noticed shared_ptr<> is significant and as some said you should make sure that you don't unnecessarily copies a shared pointer as that force an addref/release.

But there's another question in the background: Do you really need to rely on new/delete in the first place? new/delete uses malloc/free which are not tuned for allocations of small objects.

A library that helped me alot before is boost::object_pool.

At some stage I wanted to create graphs very fast. Nodes and edges are naturally dynamically allocated and I get two costs from doing that.

  1. malloc/free
  2. Memory lifetime management

boost:object_pool helps reduce both these costs at the costs of not being as general as malloc/free.

So as an example let's say we have a simple node like this:

   struct node
   {
      node * left;
      node * right;
   };

So instead of allocation node with new I use boost::object_pool. But boost::object_pool also tracks all instance allocated with it so at the end of my calculation I destroyed object_pool and didn't need to track each node thus simplifying my code and improving the performance.

I did some performance testing (I wrote my own pool class just for fun but bool::object_pool should give the same performance or better).

10,000,000 nodes created and destroyed

  1. Plain new/delete: 2.5secs
  2. shared_ptr: 5secs
  3. boost::object_pool: 0.15secs

So if boost::object_pool works for you it might help reduce the memory allocation overhead significantly.

By default, if you create your shared pointers the naïve way (i.e. shared_ptr<type> p( new type )) you incur two memory allocations, one for the actual object and an extra allocation for the reference count. You can avoid the extra allocation by making use of the make_shared template that will perform a single instantiation for both the object and the reference count and then in-place construct the object.

The rest of the extra costs are quite small compared with doubling the calls to malloc, like incrementing and decrementing the count (both atomic operations) and testing for deletion. If you can provide some code in how you are using the pointers/shared pointers you might get a better insight as to what is actually going on in the code.

Try it in "release" mode and see if you get closer benchmarks. Debug mode tends to turn on lots of assertions in the STL which slow lots of things down.

shared_ptr are noticeably slower than raw pointers. That's why they should only be used if you actually need shared ownership semantics.

Otherwise, there are several other smart pointer types available. scoped_ptr and auto_ptr (C++03) or unique_ptr (C++0x) both have their uses. And often, the best solution is not to use a pointer of any kind, and just write your own RAII class instead.

A shared_ptr has to increment/decrement/read the reference counter, and depending on the implementation and how it is instantiated, the ref counter may be allocated separately, causing potential cache misses. And it has to access the ref counter atomically, which adds additional overhead.

It's impossible to answer this without more data. Have you profiled the code to accurately identify the source of the slowdown in the shared_ptr version? Using the container will certainly add overhead but I'd be surprised if it makes it 10x slower.

VSTS has nice perf tools that will attribute the CPU usage exactly if you can run this for 30 secs or so. If you don't have access to the VS Performance Tools or other profiling toolset, then run the shared_ptr code in the debugger and break into it 10 or 15 times to get a brute force sample of where it's spending all its time. This is surprisingly and counter-intuitively effective, I have found.

[EDIT] Do not pass your shared_ptr by value in that variant of the code - use ref to const. If this function is called a lot this will have measurable -ve impact.

It's slow because it uses for reference inc/dec operations atomic instructions, thus it's horible slow. If you really need GC in C++, don't use naive RF GC and use some more developed RC strategy, or tracing GC. http://www.hboehm.info/gc/ is nice for not speed critical tasks (but a lot better than "smart pointers" naive RC).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!