C++ 02 Feb 2010 16:38:56
C++ Find Unique Elements Speed
A performance benchmark of which method is faster of finding all unique elements of a vector. The goal is having an ordered container with just the unique elements, where the tested methods are std::sort()+std::unique()+erase(), versus inserting into a separate std::set, versus inserting into std::unordered_set and then into std::set, and finally inserting into boost::unordered_set and then into std::set.
Source for the integer version is speed-find-unique-int.cpp with cycle.h.
The compilers are Microsoft Visual C++ 2010 Express as VC10 with _SECURE_SCL disabled, and GNU g++ 4.3.2.
Tests were run with 10000, 100000, 1000000, and 10000000 elements, of which 1000 are unique. For my own amusement I also tried 10000 and 32767 unique elements, but relative speeds were roughly the same.
The set.insert() column describes how much time was spent in relation to the same operation with sort()+unique()+erase(). For integers, this number is above 100% for all tests, meaning it is slower to use std::set.
<int> | sort()+unique() | set.insert() | hash+set | boost hash+set |
---|---|---|---|---|
VC10 n=10k | 100.00% | 273.51% | 376.30% | 97.39% |
VC10 n=100k | 100.00% | 287.09% | 274.24% | 29.41% |
VC10 n=1m | 100.00% | 335.19% | 329.24% | 25.60% |
VC10 n=10m | 100.00% | 343.77% | 350.25% | 20.63% |
- | ||||
g++ 4.3.2 n=10k | 100.00% | 160.28% | 47.49% | 55.51% |
g++ 4.3.2 n=100k | 100.00% | 153.75% | 22.54% | 28.49% |
g++ 4.3.2 n=1m | 100.00% | 147.02% | 27.61% | 36.09% |
g++ 4.3.2 n=10m | 100.00% | 138.61% | 29.63% | 35.38% |
And then there is the std::string version, source speed-find-unique-string.cpp with cycle.h.
For VC10, all is as expected from the integer results. However, for g++ it is considerably faster to use std::set.
I tried a test for g++ where I force a full string copy to get around g++’s copy-on-write implementation, but it didn’t change much.
<string> | sort()+unique() | set.insert() | hash+set | boost hash+set |
---|---|---|---|---|
VC10 n=10k | 100.00% | 140.72% | 91.90% | 33.12% |
VC10 n=100k | 100.00% | 163.90% | 106.33% | 20.82% |
VC10 n=1m | 100.00% | 153.36% | 101.79% | 16.35% |
VC10 n=10m | 100.00% | 158.99% | 107.24% | 16.62% |
- | ||||
g++ 4.3.2 n=10k | 100.00% | 62.53% | 19.44% | 22.69% |
g++ 4.3.2 n=100k | 100.00% | 61.05% | 12.06% | 11.88% |
g++ 4.3.2 n=1m | 100.00% | 29.23% | 5.43% | 3.99% |
g++ 4.3.2 n=10m | 100.00% | 18.55% | 3.64% | 2.54% |