C++ 14 Apr 2010 20:41:21

C++ dynamic_cast Performance

(Updated 2010-10-27: Re-run the test with latest clang++ from subversion)

A performance comparison of the speed of dynamic_cast operations in C++.

Idea from http://www.nerdblog.com/2006/12/how-slow-is-dynamiccast.html, who did not provide any source so I wrote my own more extensive tests.

Sources

Things Tested

  • reinterpret_cast on a known type
  • virtual function call + reinterpret_cast
  • member variable access + reinterpret_cast
  • successful dynamic_cast to its own type
  • successful dynamic_cast from the derived levels to lower levels
  • failed dynamic_cast from the derived levels to an unrelated type

Observations

  • dynamic_cast is slow for anything but casting to the base type; that particular cast is optimized out
  • the inheritance level has a big impact on dynamic_cast
  • member variable + reinterpret_cast is the fastest reliable way to determine type; however, that has a lot higher maintenance overhead when coding

Linux: GNU g++ 4.4.1

  • Compiler: GNU g++ 4.4.1 -std=c++0x -O3
  • Arch: Linux 2.6.27 x86_64, 2.66GHz Xeon, 8 GiB RAM
g++ 4.4.1TicksRelative Factor
reinterpret_cast known-type60081601.00
virtual function + reinterpret_cast110172481.83
member variable + reinterpret_cast80132161.33
dynamic_cast same-type-base success70093921.17
dynamic_cast same-type-level1 success309310085.15
dynamic_cast same-type-level2 success304426885.07
dynamic_cast same-type-level3 success304784325.07
dynamic_cast level1-to-base success60081761.00
dynamic_cast level2-to-base success70095041.17
dynamic_cast level3-to-base success60081761.00
dynamic_cast level2-to-level1 success360557766.00
dynamic_cast level3-to-level1 success431783207.19
dynamic_cast level3-to-level2 success360562566.00
dynamic_cast onebase-to-twobase fail270423364.50
dynamic_cast onelevel1-to-twobase fail330512325.50
dynamic_cast onelevel2-to-twobase fail390961286.51
dynamic_cast onelevel3-to-twobase fail9944582416.55

Linux: LLVM clang++

clang++ 1.5 (trunk 117443)TicksRelative Factor
reinterpret_cast known-type70094001.00
virtual function + reinterpret_cast110209521.57
member variable + reinterpret_cast70158801.00
dynamic_cast same-type-base success70100481.00
dynamic_cast same-type-level1 success581219128.29
dynamic_cast same-type-level2 success661422809.44
dynamic_cast same-type-level3 success12079385617.23
dynamic_cast level1-to-base success70121761.00
dynamic_cast level2-to-base success70130161.00
dynamic_cast level3-to-base success70091681.00
dynamic_cast level2-to-level1 success671429849.58
dynamic_cast level3-to-level1 success7215433610.29
dynamic_cast level3-to-level2 success7214904810.29
dynamic_cast onebase-to-twobase fail290598244.15
dynamic_cast onelevel1-to-twobase fail371073285.29
dynamic_cast onelevel2-to-twobase fail440926646.29
dynamic_cast onelevel3-to-twobase fail510846887.29

Windows: MSVC++ 2010 Express

  • Compiler: MSVC++ 2010 Express _SECURE_SCL=0
  • Arch: Windows 7 64 bit, 1.83GHz Core2Duo, 4 GiB RAM
VC++ 2010TicksRelative Factor
reinterpret_cast known-type60045261.00
virtual function + reinterpret_cast110084701.83
member variable + reinterpret_cast60045701.00
dynamic_cast same-type-base success60045481.00
dynamic_cast same-type-level1 success7074310111.78
dynamic_cast same-type-level2 success10173652516.94
dynamic_cast same-type-level3 success13529929622.53
dynamic_cast level1-to-base success70057901.17
dynamic_cast level2-to-base success60042731.00
dynamic_cast level3-to-base success60042291.00
dynamic_cast level2-to-level1 success11712382019.51
dynamic_cast level3-to-level1 success15748894726.23
dynamic_cast level3-to-level2 success14907660924.83
dynamic_cast onebase-to-twobase fail8498430614.15
dynamic_cast onelevel1-to-twobase fail10499412017.49
dynamic_cast onelevel2-to-twobase fail13041835421.72
dynamic_cast onelevel3-to-twobase fail15114255225.17

8 Responses to “C++ dynamic_cast Performance”

  1. on 16 Apr 2010 at 14:53:31 1.Jorn said …

    It would be nice to see windows tested on a comparable machine

  2. on 16 Apr 2010 at 18:12:56 2.Tino Didriksen said …

    I agree, but that would be a lot of extra work…

    Right now, I can develop and run the tests simultaneously on my own Windows computer and via SSH on the Linux machine. Having to reboot to Linux just to re-run the tests would be painful. There is also the Cygwin or MinGW options, but while they are nice they really cannot be called a representative picture of how g++ performs.

    Plus, the tests I do are not about compiler vs. compiler; they’re about general methods and which ones reliably perform well cross-platform.

  3. on 01 Oct 2010 at 17:06:19 3.Steven Lubars said …

    Although to be avoided when possible, applications sometimes need to perform explicit RTTI. Several techniques exist to improve upon the performance of using dynamic_cast for this purpose:

    1) Wite a wrapper around type_info that allows comparison, assignment, etc. (see Alexandrescu).

    2) I recently encountered another technique at http://ciaranm.wordpress.com/2010/05/24/runtime-type-checking-in-c-without-rtti/ that assigns a unique integer to each type:


    #include <iostream>
    using namespace std;

    short TypeId() {
        static short typeId = 0;
        return typeId++;
    }

    template <typename T>
    short TypeId() {
        static short typeId = TypeId();
        return typeId;
    }

    int main() {
        cout << TypeId<int>() << endl;
        cout << TypeId<short>() << endl;
        cout << TypeId<float>() << endl;
        cout << TypeId<int>() << endl;
    }


    0
    1
    2
    0

    Of course this works for complex types as well.

    I think it would be interesting to compare the performance of these techniques to dynamic_cast for the purpose of explicit RTTI.

  4. on 27 Oct 2010 at 09:16:25 4.Cyril said …

    I think it is worth noting that reinterpret_cast is a null-cost operation from the CPU point of view since it does not translate to any actual machine code after compilation. What you measure here is the cost of the benchmark loop itself as well as the call to getType(), which you don’t need to do if you know the actual type of the object.
    Where did I lose the point?

  5. on 27 Oct 2010 at 13:43:04 5.Tino Didriksen said …

    Part of the point was that it is much much faster to code around dynamic_cast any way you can, including designing the program so that you will always know the type.

    So you did not lose the point. This is just to show new coders how horribly slow dynamic_cast is and why they should design to avoid it.

  6. on 27 Oct 2010 at 13:52:08 6.Cyril said …

    Indeed.
    Looks like on MSVC, dynamic_casts are implemented with ugly recursive strcmps. Btw, Qt users should always resort to qobject_cast instead, which only does a pointer comparison by inheritance level.

    Thanks for your answer and your nice article!

  7. on 08 Jun 2018 at 07:35:38 7.Martin said …

    Just one little thing: in this context reinterpret_cast is a recipe for disaster.

    Try it with multiple inheritance and you’ll see what I mean ;-)

    I bet that static_cast has the same performance in simple cases and just makes an ADD or SUB when multiple inheritance is involved.

  8. on 08 Apr 2020 at 05:24:54 8.freealibi said …

    I only have one simple question: is fat pointer destined to be faster than dynamic_cast?

    The context is that one project I was told to maintain has this 16 lines of switch..case to determine the real type of a fat pointer.

    Given that dynamic_cast can only incur one (L3 maybe?) cache miss, and this 16 lines of switch..case function may as well cause at least L1i$ miss, maybe the performance is neglectable somehow?

    Also maintaining 16 lines of switch..case is a pain in the ass. I might just use dynamic_cast instead no matter what.

Subscribe to the comments through RSS Feed

Leave a Reply

*
To prove you're a person (not a spam script), type the security word shown in the picture. Click on the picture to hear an audio file of the word.
Anti-spam image