Smart pointer wrapping penalty. Memoization with std::map

前端未结

关注

 4  692

无人及你 2021-01-05 17:21

I am currently in the middle of a project where performance is of vital importance. Following are some of the questions I had regarding this issue.

Question1

4条回答

无人及你 (楼主)

2021-01-05 17:59

Answer to Q#1

If the regular pointers are faster and I already have shared pointers what options do I have in order to call a method that the shared pointer points to?

operator-> within boost::shared_ptr has assertion:

typename boost::detail::sp_member_access< T >::type operator-> () const { BOOST_ASSERT( px != 0 ); return px; }

So, first of all, be sure that you have NDEBUG defined (usually in release builds it is done automatically):

#define NDEBUG

I have made assembler comparison between dereferencing of boost::shared_ptr and raw pointer:

template NOINLINE void test(const T &p) { volatile auto anti_opti=0; ASM_MARKER(); anti_opti = p->data; anti_opti = p->data; ASM_MARKER(); (void)anti_opti; }

test<1000>(new Foo);

ASM code of test when T is Foo* is (don't be scared, I have diff below):

_Z4testILi1000EP3FooEvRKT0_: .LFB4088: .cfi_startproc pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 movq %rdi, %rbx subq $16, %rsp .cfi_def_cfa_offset 32 movl $0, 12(%rsp) call _Z10ASM_MARKERILi1000EEvv movq (%rbx), %rax movl (%rax), %eax movl %eax, 12(%rsp) movl %eax, 12(%rsp) call _Z10ASM_MARKERILi1001EEvv movl 12(%rsp), %eax addq $16, %rsp .cfi_def_cfa_offset 16 popq %rbx .cfi_def_cfa_offset 8 ret .cfi_endproc

test<2000>(boost::make_shared());

ASM code of test when T is boost::shared_ptr:

_Z4testILi2000EN5boost10shared_ptrI3FooEEEvRKT0_: .LFB4090: .cfi_startproc pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 movq %rdi, %rbx subq $16, %rsp .cfi_def_cfa_offset 32 movl $0, 12(%rsp) call _Z10ASM_MARKERILi2000EEvv movq (%rbx), %rax movl (%rax), %eax movl %eax, 12(%rsp) movl %eax, 12(%rsp) call _Z10ASM_MARKERILi2001EEvv movl 12(%rsp), %eax addq $16, %rsp .cfi_def_cfa_offset 16 popq %rbx .cfi_def_cfa_offset 8 ret .cfi_endproc

Here is output of diff -U 0 foo_p.asm shared_ptr_foo_p.asm command:

--- foo_p.asm Fri Apr 12 10:38:05 2013 +++ shared_ptr_foo_p.asm Fri Apr 12 10:37:52 2013 @@ -1,2 +1,2 @@ -_Z4testILi1000EP3FooEvRKT0_: -.LFB4088: +_Z4testILi2000EN5boost10shared_ptrI3FooEEEvRKT0_: +.LFB4090: @@ -11 +11 @@ -call _Z10ASM_MARKERILi1000EEvv +call _Z10ASM_MARKERILi2000EEvv @@ -16 +16 @@ -call _Z10ASM_MARKERILi1001EEvv +call _Z10ASM_MARKERILi2001EEvv

As you can see, difference is only in function signature, and tag non-type template argument value, rest of code is IDENTICAL.

In general - shared_ptr is very costly - it's reference counting is syncronized between threads (usually via atomic operations). If you would use boost::intrusive_ptr instead, then you can implement your own increment/decrement without thread-synchronization, which would speed-up reference counting.

If you can afford using unique_ptr or move semantic (via Boost.Move or C++11) - then there will be no any reference counting - it would be faster even more.

LIVE DEMO WITH ASM OUTPUT

#define NDEBUG #include #include #define NOINLINE __attribute__ ((noinline)) template NOINLINE void ASM_MARKER() { volatile auto anti_opti = 11; (void)anti_opti; } struct Foo { int data; }; template NOINLINE void test(const T &p) { volatile auto anti_opti=0; ASM_MARKER(); anti_opti = p->data; anti_opti = p->data; ASM_MARKER(); (void)anti_opti; } int main() { { auto p = new Foo; test<1000>(p); delete p; } { test<2000>(boost::make_shared()); } }

Answer to Q#2

I have an instance method(s) that is rapidly called that creates a std::vector on the stack every time.

Generally, it is good idea to try to reuse vector's capacity in order to prevent costly re-allocations. For instance it is better to replace:

{ for(/*...*/) { std::vector temp; // do work on temp } }

with:

{ std::vector temp; for(/*...*/) { // do work on temp temp.clear(); } }

But looks like due to type std::map*> you are trying to perfom some kind of memoization.

As already suggested, instead of std::map which has O(ln(N)) lookup/insert you may try to use boost::unordered_map/std::unordered_map which has O(1) average and O(N) worst case complexity for lookup/insert, and better locality/compactess (cache-friendly).

Also, cosider to try Boost.Flyweight:

Flyweights are small-sized handle classes granting constant access to shared common data, thus allowing for the management of large amounts of entities within reasonable memory limits. Boost.Flyweight makes it easy to use this common programming idiom by providing the class template flyweight, which acts as a drop-in replacement for const T.

0 讨论(0)

查看其它4个回答

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复