I am currently in the middle of a project where performance is of vital importance. Following are some of the questions I had regarding this issue.
Question1
If the regular pointers are faster and I already have shared pointers what options do I have in order to call a method that the shared pointer points to?
operator-> within boost::shared_ptr has assertion:
typename boost::detail::sp_member_access< T >::type operator-> () const
{
BOOST_ASSERT( px != 0 );
return px;
}
So, first of all, be sure that you have NDEBUG defined (usually in release builds it is done automatically):
#define NDEBUG
I have made assembler comparison between dereferencing of boost::shared_ptr and raw pointer:
template
NOINLINE void test(const T &p)
{
volatile auto anti_opti=0;
ASM_MARKER();
anti_opti = p->data;
anti_opti = p->data;
ASM_MARKER();
(void)anti_opti;
}
test<1000>(new Foo);
ASM code of test when T is Foo* is (don't be scared, I have diff below):
_Z4testILi1000EP3FooEvRKT0_:
.LFB4088:
.cfi_startproc
pushq %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
movq %rdi, %rbx
subq $16, %rsp
.cfi_def_cfa_offset 32
movl $0, 12(%rsp)
call _Z10ASM_MARKERILi1000EEvv
movq (%rbx), %rax
movl (%rax), %eax
movl %eax, 12(%rsp)
movl %eax, 12(%rsp)
call _Z10ASM_MARKERILi1001EEvv
movl 12(%rsp), %eax
addq $16, %rsp
.cfi_def_cfa_offset 16
popq %rbx
.cfi_def_cfa_offset 8
ret
.cfi_endproc
test<2000>(boost::make_shared());
ASM code of test when T is boost::shared_ptr:
_Z4testILi2000EN5boost10shared_ptrI3FooEEEvRKT0_:
.LFB4090:
.cfi_startproc
pushq %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
movq %rdi, %rbx
subq $16, %rsp
.cfi_def_cfa_offset 32
movl $0, 12(%rsp)
call _Z10ASM_MARKERILi2000EEvv
movq (%rbx), %rax
movl (%rax), %eax
movl %eax, 12(%rsp)
movl %eax, 12(%rsp)
call _Z10ASM_MARKERILi2001EEvv
movl 12(%rsp), %eax
addq $16, %rsp
.cfi_def_cfa_offset 16
popq %rbx
.cfi_def_cfa_offset 8
ret
.cfi_endproc
Here is output of diff -U 0 foo_p.asm shared_ptr_foo_p.asm command:
--- foo_p.asm Fri Apr 12 10:38:05 2013
+++ shared_ptr_foo_p.asm Fri Apr 12 10:37:52 2013
@@ -1,2 +1,2 @@
-_Z4testILi1000EP3FooEvRKT0_:
-.LFB4088:
+_Z4testILi2000EN5boost10shared_ptrI3FooEEEvRKT0_:
+.LFB4090:
@@ -11 +11 @@
-call _Z10ASM_MARKERILi1000EEvv
+call _Z10ASM_MARKERILi2000EEvv
@@ -16 +16 @@
-call _Z10ASM_MARKERILi1001EEvv
+call _Z10ASM_MARKERILi2001EEvv
As you can see, difference is only in function signature, and tag non-type template argument value, rest of code is IDENTICAL.
In general - shared_ptr is very costly - it's reference counting is syncronized between threads (usually via atomic operations). If you would use boost::intrusive_ptr instead, then you can implement your own increment/decrement without thread-synchronization, which would speed-up reference counting.
If you can afford using unique_ptr or move semantic (via Boost.Move or C++11) - then there will be no any reference counting - it would be faster even more.
LIVE DEMO WITH ASM OUTPUT
#define NDEBUG
#include
#include
#define NOINLINE __attribute__ ((noinline))
template
NOINLINE void ASM_MARKER()
{
volatile auto anti_opti = 11;
(void)anti_opti;
}
struct Foo
{
int data;
};
template
NOINLINE void test(const T &p)
{
volatile auto anti_opti=0;
ASM_MARKER();
anti_opti = p->data;
anti_opti = p->data;
ASM_MARKER();
(void)anti_opti;
}
int main()
{
{
auto p = new Foo;
test<1000>(p);
delete p;
}
{
test<2000>(boost::make_shared());
}
}
I have an instance method(s) that is rapidly called that creates a std::vector on the stack every time.
Generally, it is good idea to try to reuse vector's capacity in order to prevent costly re-allocations. For instance it is better to replace:
{
for(/*...*/)
{
std::vector temp;
// do work on temp
}
}
with:
{
std::vector temp;
for(/*...*/)
{
// do work on temp
temp.clear();
}
}
But looks like due to type std::map you are trying to perfom some kind of memoization.
As already suggested, instead of std::map which has O(ln(N)) lookup/insert you may try to use boost::unordered_map/std::unordered_map which has O(1) average and O(N) worst case complexity for lookup/insert, and better locality/compactess (cache-friendly).
Also, cosider to try Boost.Flyweight:
Flyweights are small-sized handle classes granting constant access to shared common data, thus allowing for the management of large amounts of entities within reasonable memory limits. Boost.Flyweight makes it easy to use this common programming idiom by providing the class template flyweight, which acts as a drop-in replacement for const T.