Before reading the question:
This question is not about how useful it is to use dynamic_cast
. Its just about its performance.
I\'ve recently develop
Performance is meaningless without comparing equivalent functionality. Most people say dynamic_cast is slow without comparing to equivalent behavior. Call them out on this. Put another way:
If 'works' isn't a requirement, I can write code that fails faster than yours.
There are various ways to implement dynamic_cast, and some are faster than others. Stroustrup published a paper about using primes to improve dynamic_cast, for example. Unfortunately it's unusual to control how your compiler implements the cast, but if performance really matters to you, then you do have control over which compiler you use.
However, not using dynamic_cast will always be faster than using it — but if you don't actually need dynamic_cast, then don't use it! If you do need dynamic lookup, then there will be some overhead, and you can then compare various strategies.
Sorry to say this, but your test is virtually useless for determining whether the cast is slow or not. Microsecond resolution is nowhere near good enough. We're talking about an operation that, even in the worst case scenario, shouldn't take more than, say, 100 clock ticks, or less than 50 nanoseconds on a typical PC.
There's no doubt that the dynamic cast will be slower than a static cast or a reinterpret cast, because, on the assembly level, the latter two will amount to an assignment (really fast, order of 1 clock tick), and the dynamic cast requires the code to go and inspect the object to determine its real type.
I can't say off-hand how slow it really is, that would probably vary from compiler to compiler, I'd need to see the assembly code generated for that line of code. But, like I said, 50 nanoseconds per call is the upper limit of what expect to be reasonable.
Here are a few benchmarks:
http://tinodidriksen.com/2010/04/14/cpp-dynamic-cast-performance/
http://www.nerdblog.com/2006/12/how-slow-is-dynamiccast.html
According to them, dynamic_cast is 5-30 times slower than reinterpret_cast, and the best alternative performs almost the same as reinterpret_cast.
I'll quote the conclusion from the first article:
- dynamic_cast is slow for anything but casting to the base type; that particular cast is optimized out
- the inheritance level has a big impact on dynamic_cast
- member variable + reinterpret_cast is the fastest reliable way to
determine type; however, that has a lot higher maintenance overhead
when coding
Absolute numbers are on the order of 100 ns for a single cast. Values like 74 msec doesn't seem close to reality.
Your mileage may vary, to understate the situation.
The performance of dynamic_cast depends a great deal on what you are doing, and can depend on what the names of classes are (and, comparing time relative to reinterpet_cast
seems odd, since in most cases that takes zero instructions for practical purposes, as does e.g. a cast from unsigned
to int
).
I've been looking into how it works in clang/g++. Assuming that you are dynamic_cast
ing from a B*
to a D*
, where B
is a (direct or indirect) base of D
, and disregarding multiple-base-class complications, It seems to work by calling a library function which does something like this:
for dynamic_cast<D*>( p ) where p is B*
type_info const * curr_typ = &typeid( *p );
while(1) {
if( *curr_typ == typeid(D)) { return static_cast<D*>(p); } // success;
if( *curr_typ == typeid(B)) return nullptr; //failed
curr_typ = get_direct_base_type_of(*curr_typ); // magic internal operation
}
So, yes, it's pretty fast when *p
is actually a D
; just one successful type_info
compare.
The worst case is when the cast fails, and there are a lot of steps from D
to B
; in this case there are a lot of failed type comparisons.
How long does type comparison take? it does this, on clang/g++:
compare_eq( type_info const &a, type_info const & b ){
if( &a == &b) return true; // same object
return strcmp( a.name(), b.name())==0;
}
The strcmp is needed since it's possible to have two different type_info
objects representing the same type (although I'm pretty sure this only happens when one is in a shared library, and the other is not in that library). But, in most cases, when types are actually equal, they reference the same type_info; thus most successful type comparisons are very fast.
The name()
method just returns a pointer to a fixed string containing the mangled name of the class.
So there's another factor: if many of the classes on the way from D
to B
have names starting with MyAppNameSpace::AbstractSyntaxNode<
, then the failing compares are going to take longer than usual; the strcmp won't fail until it reaches a difference in the mangled type names.
And, of course, since the operation as a whole is traversing a bunch of linked data structures representing the type hierarchy, the time will depend on whether those things are fresh in the cache or not. So the same cast done repeatedly is likely to show an average time which doesn't necessarily represent the typical performance for that cast.
Firstly, you need to measure the performance over a lot more than just a few iterations, as your results will be dominated by the resolution of the timer. Try e.g. 1 million+, in order to build up a representative picture. Also, this result is meaningless unless you compare it against something, i.e. doing the equivalent but without the dynamic casting.
Secondly, you need to ensure the compiler isn't giving you false results by optimising away multiple dynamic casts on the same pointer (so use a loop, but use a different input pointer each time).
Dynamic casting will be slower, because it needs to access the RTTI (run-time type information) table for the object, and check that the cast is valid. Then, in order to use it properly, you will need to add error-handling code that checks whether the returned pointer is NULL
. All of this takes up cycles.
I know you didn't want to talk about this, but "a design where dynamic_cast is used a lot" is probably an indicator that you're doing something wrong...