Consider the following code:
class A
{
B* b; // an A object owns a B object
A() : b(NULL) { } // we don\'t know what b will be when constructing A
A quick test of Martin York's assertion that this is a premature optimisation, and that new/delete are optimised well beyond the ability of mere programmers to improve. Obviously the questioner will have to time his own code to see whether avoiding new/delete helps him, but it seems to me that for certain classes and uses it will make a big difference:
#include
#include
int g_construct = 0;
int g_destruct = 0;
struct A {
std::vector vec;
A (int a, int b) : vec((a*b) % 2) { ++g_construct; }
~A() {
++g_destruct;
}
};
int main() {
const int times = 10*1000*1000;
#if DYNAMIC
std::cout << "dynamic\n";
A *x = new A(1,3);
for (int i = 0; i < times; ++i) {
delete x;
x = new A(i,3);
}
#else
std::cout << "automatic\n";
char x[sizeof(A)];
A* yzz = new (x) A(1,3);
for (int i = 0; i < times; ++i) {
yzz->~A();
new (x) A(i,3);
}
#endif
std::cout << g_construct << " constructors and " << g_destruct << " destructors\n";
}
$ g++ allocperf.cpp -oallocperf -O3 -DDYNAMIC=0 -g && time ./allocperf
automatic
10000001 constructors and 10000000 destructors
real 0m7.718s
user 0m7.671s
sys 0m0.030s
$ g++ allocperf.cpp -oallocperf -O3 -DDYNAMIC=1 -g && time ./allocperf
dynamic
10000001 constructors and 10000000 destructors
real 0m15.188s
user 0m15.077s
sys 0m0.047s
This is roughly what I expected: the GMan-style (destruct/placement new) code takes twice as long, and is presumably doing twice as much allocation. If the vector member of A is replaced with an int, then the GMan-style code takes a fraction of a second. That's GCC 3.
$ g++-4 allocperf.cpp -oallocperf -O3 -DDYNAMIC=1 -g && time ./allocperf
dynamic
10000001 constructors and 10000000 destructors
real 0m5.969s
user 0m5.905s
sys 0m0.030s
$ g++-4 allocperf.cpp -oallocperf -O3 -DDYNAMIC=0 -g && time ./allocperf
automatic
10000001 constructors and 10000000 destructors
real 0m2.047s
user 0m1.983s
sys 0m0.000s
This I'm not so sure about, though: now the delete/new takes three times as long as the destruct/placement new version.
[Edit: I think I've figured it out - GCC 4 is faster on the 0-sized vectors, in effect subtracting a constant time from both versions of the code. Changing (a*b)%2 to (a*b)%2+1 restores the 2:1 time ratio, with 3.7s vs 7.5]
Note that I've not taken any special steps to correctly align the stack array, but printing the address shows it's 16-aligned.
Also, -g doesn't affect the timings. I left it in accidentally after I was looking at the objdump to check that -O3 hadn't completely removed the loop. That pointers called yzz because searching for "y" didn't go quite as well as I'd hoped. But I've just re-run without it.