C++ 14 Apr 2010 20:41:21
C++ dynamic_cast Performance
(Updated 2010-10-27: Re-run the test with latest clang++ from subversion)
A performance comparison of the speed of dynamic_cast operations in C++.
Idea from http://www.nerdblog.com/2006/12/how-slow-is-dynamiccast.html, who did not provide any source so I wrote my own more extensive tests.
Sources
- Ticks counted via cycle.h (local mirror)
- Source: speed-dynamic-cast.cpp
Things Tested
- reinterpret_cast on a known type
- virtual function call + reinterpret_cast
- member variable access + reinterpret_cast
- successful dynamic_cast to its own type
- successful dynamic_cast from the derived levels to lower levels
- failed dynamic_cast from the derived levels to an unrelated type
Observations
- dynamic_cast is slow for anything but casting to the base type; that particular cast is optimized out
- the inheritance level has a big impact on dynamic_cast
- member variable + reinterpret_cast is the fastest reliable way to determine type; however, that has a lot higher maintenance overhead when coding
Linux: GNU g++ 4.4.1
- Compiler: GNU g++ 4.4.1 -std=c++0x -O3
- Arch: Linux 2.6.27 x86_64, 2.66GHz Xeon, 8 GiB RAM
g++ 4.4.1 | Ticks | Relative Factor |
---|---|---|
reinterpret_cast known-type | 6008160 | 1.00 |
virtual function + reinterpret_cast | 11017248 | 1.83 |
member variable + reinterpret_cast | 8013216 | 1.33 |
dynamic_cast same-type-base success | 7009392 | 1.17 |
dynamic_cast same-type-level1 success | 30931008 | 5.15 |
dynamic_cast same-type-level2 success | 30442688 | 5.07 |
dynamic_cast same-type-level3 success | 30478432 | 5.07 |
dynamic_cast level1-to-base success | 6008176 | 1.00 |
dynamic_cast level2-to-base success | 7009504 | 1.17 |
dynamic_cast level3-to-base success | 6008176 | 1.00 |
dynamic_cast level2-to-level1 success | 36055776 | 6.00 |
dynamic_cast level3-to-level1 success | 43178320 | 7.19 |
dynamic_cast level3-to-level2 success | 36056256 | 6.00 |
dynamic_cast onebase-to-twobase fail | 27042336 | 4.50 |
dynamic_cast onelevel1-to-twobase fail | 33051232 | 5.50 |
dynamic_cast onelevel2-to-twobase fail | 39096128 | 6.51 |
dynamic_cast onelevel3-to-twobase fail | 99445824 | 16.55 |
Linux: LLVM clang++
- Compiler: clang++ 2.9 (trunk 117443) -O3
- Arch: Linux 2.6.27 x86_64, 2.66GHz Xeon, 8 GiB RAM
clang++ 1.5 (trunk 117443) | Ticks | Relative Factor |
---|---|---|
reinterpret_cast known-type | 7009400 | 1.00 |
virtual function + reinterpret_cast | 11020952 | 1.57 |
member variable + reinterpret_cast | 7015880 | 1.00 |
dynamic_cast same-type-base success | 7010048 | 1.00 |
dynamic_cast same-type-level1 success | 58121912 | 8.29 |
dynamic_cast same-type-level2 success | 66142280 | 9.44 |
dynamic_cast same-type-level3 success | 120793856 | 17.23 |
dynamic_cast level1-to-base success | 7012176 | 1.00 |
dynamic_cast level2-to-base success | 7013016 | 1.00 |
dynamic_cast level3-to-base success | 7009168 | 1.00 |
dynamic_cast level2-to-level1 success | 67142984 | 9.58 |
dynamic_cast level3-to-level1 success | 72154336 | 10.29 |
dynamic_cast level3-to-level2 success | 72149048 | 10.29 |
dynamic_cast onebase-to-twobase fail | 29059824 | 4.15 |
dynamic_cast onelevel1-to-twobase fail | 37107328 | 5.29 |
dynamic_cast onelevel2-to-twobase fail | 44092664 | 6.29 |
dynamic_cast onelevel3-to-twobase fail | 51084688 | 7.29 |
Windows: MSVC++ 2010 Express
- Compiler: MSVC++ 2010 Express _SECURE_SCL=0
- Arch: Windows 7 64 bit, 1.83GHz Core2Duo, 4 GiB RAM
VC++ 2010 | Ticks | Relative Factor |
---|---|---|
reinterpret_cast known-type | 6004526 | 1.00 |
virtual function + reinterpret_cast | 11008470 | 1.83 |
member variable + reinterpret_cast | 6004570 | 1.00 |
dynamic_cast same-type-base success | 6004548 | 1.00 |
dynamic_cast same-type-level1 success | 70743101 | 11.78 |
dynamic_cast same-type-level2 success | 101736525 | 16.94 |
dynamic_cast same-type-level3 success | 135299296 | 22.53 |
dynamic_cast level1-to-base success | 7005790 | 1.17 |
dynamic_cast level2-to-base success | 6004273 | 1.00 |
dynamic_cast level3-to-base success | 6004229 | 1.00 |
dynamic_cast level2-to-level1 success | 117123820 | 19.51 |
dynamic_cast level3-to-level1 success | 157488947 | 26.23 |
dynamic_cast level3-to-level2 success | 149076609 | 24.83 |
dynamic_cast onebase-to-twobase fail | 84984306 | 14.15 |
dynamic_cast onelevel1-to-twobase fail | 104994120 | 17.49 |
dynamic_cast onelevel2-to-twobase fail | 130418354 | 21.72 |
dynamic_cast onelevel3-to-twobase fail | 151142552 | 25.17 |
on 16 Apr 2010 at 14:53:31 1.Jorn said …
It would be nice to see windows tested on a comparable machine
on 16 Apr 2010 at 18:12:56 2.Tino Didriksen said …
I agree, but that would be a lot of extra work…
Right now, I can develop and run the tests simultaneously on my own Windows computer and via SSH on the Linux machine. Having to reboot to Linux just to re-run the tests would be painful. There is also the Cygwin or MinGW options, but while they are nice they really cannot be called a representative picture of how g++ performs.
Plus, the tests I do are not about compiler vs. compiler; they’re about general methods and which ones reliably perform well cross-platform.
on 01 Oct 2010 at 17:06:19 3.Steven Lubars said …
Although to be avoided when possible, applications sometimes need to perform explicit RTTI. Several techniques exist to improve upon the performance of using dynamic_cast for this purpose:
1) Wite a wrapper around type_info that allows comparison, assignment, etc. (see Alexandrescu).
2) I recently encountered another technique at http://ciaranm.wordpress.com/2010/05/24/runtime-type-checking-in-c-without-rtti/ that assigns a unique integer to each type:
#include <iostream>
using namespace std;
short TypeId() {
static short typeId = 0;
return typeId++;
}
template <typename T>
short TypeId() {
static short typeId = TypeId();
return typeId;
}
int main() {
cout << TypeId<int>() << endl;
cout << TypeId<short>() << endl;
cout << TypeId<float>() << endl;
cout << TypeId<int>() << endl;
}
0
1
2
0
Of course this works for complex types as well.
I think it would be interesting to compare the performance of these techniques to dynamic_cast for the purpose of explicit RTTI.
on 27 Oct 2010 at 09:16:25 4.Cyril said …
I think it is worth noting that reinterpret_cast is a null-cost operation from the CPU point of view since it does not translate to any actual machine code after compilation. What you measure here is the cost of the benchmark loop itself as well as the call to getType(), which you don’t need to do if you know the actual type of the object.
Where did I lose the point?
on 27 Oct 2010 at 13:43:04 5.Tino Didriksen said …
Part of the point was that it is much much faster to code around dynamic_cast any way you can, including designing the program so that you will always know the type.
So you did not lose the point. This is just to show new coders how horribly slow dynamic_cast is and why they should design to avoid it.
on 27 Oct 2010 at 13:52:08 6.Cyril said …
Indeed.
Looks like on MSVC, dynamic_casts are implemented with ugly recursive strcmps. Btw, Qt users should always resort to qobject_cast instead, which only does a pointer comparison by inheritance level.
Thanks for your answer and your nice article!
on 08 Jun 2018 at 07:35:38 7.Martin said …
Just one little thing: in this context reinterpret_cast is a recipe for disaster.
Try it with multiple inheritance and you’ll see what I mean ;-)
I bet that static_cast has the same performance in simple cases and just makes an ADD or SUB when multiple inheritance is involved.
on 08 Apr 2020 at 05:24:54 8.freealibi said …
I only have one simple question: is fat pointer destined to be faster than dynamic_cast?
The context is that one project I was told to maintain has this 16 lines of switch..case to determine the real type of a fat pointer.
Given that dynamic_cast can only incur one (L3 maybe?) cache miss, and this 16 lines of switch..case function may as well cause at least L1i$ miss, maybe the performance is neglectable somehow?
Also maintaining 16 lines of switch..case is a pain in the ass. I might just use dynamic_cast instead no matter what.
on 05 Jun 2024 at 01:52:45 9.Mateusz Płóciennik said …
I have rewritten `speed_dynamic_cast.cpp` using Google Benchmark.
See: https://quick-bench.com/q/AFFjW622BSd4Zd1HqQXpB-2bdok