How much of a bottleneck is memory allocation/deallocation in typical real-world programs? Answers from any type of program where performance typically matters are welcome.
According to MicroQuill SmartHeap Technical Specification, "a typical application [...] spends 40% of its total execution time on managing memory". You can take this figure as an upper bound, i personally feel that a typical application spends more like 10-15% of execution time allocating/deallocating memory. It rarely is a bottleneck in single-threaded application.
In multithreaded C/C++ applications standard allocators become an issue due to lock contention. This is where you start to look for more scalable solutions. But keep in mind Amdahl's Law.