C++ question here. I have a system where I\'m going to have hundreds of mini-subclasses of a given superclass. They all will have a \"foo\" method that does something. Or
To the other answers here I would add two more.
1) It is more difficult and less common for a compiler to perform classic optimizations (including enregistration) across a virtual function call interface than across case labeled statements in a switch statement in a single function.
2) Any performance difference in the dispatch is highly depedendent on the processor's branch prediction hardware. Even a virtual function call target address (and return) may be correctly predicted and have negligible performance overhead in the pipeline of a modern out-of-order processor.
If the performance of this operation really matters, you really have to try it both ways and measure it, in the context of the real system.
Happy hacking!
If I have it as a switch statement, I can put the commonly occuring foo's up at the top of the switch statement and the less common ones at the bottom, hopefully shortcutting the comparison.
A switch
statement is generally compiled to a jump table rather than a block of if-else
conditionals as your question implies. In practice, the virtual table and the switch
jump table should have similar performance, though test if you're really concerned.
There's been some research on this topic in the field of virtual machine design. Generally, a switch statement is going to be faster, a lot of virtual machines use switch semantics as opposed to virtual lookup. Theoretically, one would assume that a virtual table - being a constant time algorithm - will be faster, but we have to examine how the hardware sees a virtual table.
A switch statement is easier for the compiler to inline. This is a huge consideration, the actual act of calling a virtual function is minimal, however, pushing and popping the entire stack frame is necessary because the compiler has no idea which function will be called at run-time.
Branch prediction and hardware prefetch should be easier on a switch statement, although modern architectures are getting better at predicting virtual calls.
A lot of code that uses virtual dispatch requires the use of heap based allocation schemes. Dynamic memory allocation is a bottleneck in a lot C++ applications.
Vtable should be faster in nearly all cases, but if performance is so critical, the right thing to ask is by how much.
Vtable call is triple indirection (three memory accesses to get the target CALL address). Cache misses should not be an issue if there're many calls. So, it is roughly 2-3 switch label comparisons (though the latter offer even less chance for CPU cache miss, but less for pipe usage).
You should of course not rely on anything I said here, and test it all with true performance measurements on your target architecture.
The compiler determines how the switch statements are handled, but there are a few basic techniques they use.
Where the case statements are located in the switch statement makes no difference in either case.
Virtual functions have an overhead compared to direct call. It involves an additional offset and pointer lookup. For all but the most extreme performance considerations this cost is negligible. When comparing to a switch the overhead is not in the virtual lookup, but the function call itself. So a switch statement that simply calls functions in each case will perform basically the same as virtual functions.
So essentially the "dispatch semantics" of a switch statement (with jump table) compared to a virtual function call are nearly irrelevant. If all your "foo" methods are relatively small and can be inlined the switch statement will start to perform better. The other advantage of switch is that you can put common code before the switch and get better register/stack optimizations.
However, there is a significant maintenance overhead. This should be your primary concern at this point. Why? Because the performance bottle-neck in your code is not likely the switching login, or even the function calls, but something else. Until you fix that something else there is no point in addressing these low-level performance issues. So stick with whichever provides more maintainable code at the moment.