I have a single line of code, that consumes 25% - 30% of the runtime of my application. It is a less-than comparator for an std::set (the set is implemented with a Red-Black-Tre
I have a hard time believing that:
a) The comparison function runs 180 million times in 30 seconds
and
b) The comparison function uses 25% of the cpu time
are both true. Even a Core 2 Duo should easily be able to run 180 million comparisons in less than a second (after all, the claim is that it can do something like 12,000 MIPS, if that actually means anything). So I'm inclined to believe that there is something else being lumped in with the comparison by the profiling software. (Allocating memory for new elements, for example.)
However, you should at least consider the possibility that a std::set is not the data structure you're looking for. If you do millions of inserts before you actually need the sorted values (or maximum value, even), then you may well be better off putting the values into a vector, which is a much cheaper data structure both in time and space, and sorting it on demand.
If you actually need the set because you're worried about collisions, then you might consider an unordered_set instead, which is slight cheaper but not as cheap as a vector. (Precisely because vectors cannot guarantee you uniqueness.) But honestly, looking at that structure definition, I have a hard time believing that uniqueness is important to you.
"Benchmark"
On my little Core i5 laptop, which I suppose is not in the same league as OP's machine, I ran a few tests inserting 10 million random unique Entry's (with just the two comparison fields) into a std::set and into a std::vector. At the end of this, I sort the vector.
I did this twice; once with a random generator that produces probably unique costs, and once with a generator which produces exactly two different costs (which should make the compare slower). Ten million inserts results in slightly more comparisons than reported by OP.
unique cost discrete cost
compares time compares time
set 243002508 14.7s 241042920 15.6s
vector 301036818 2.0s 302225452 2.3s
In an attempt to further isolate the comparison times, I redid the vector benchmarks using both std::sort and std::partial_sort, using 10 elements (essentially a selection of top-10) and 10% of the elements (that is, one million). The results of the larger partial_sort surprised me -- who would have thought that sorting 10% of a vector would be slower than sorting all of it -- but they show that algorithm costs are a lot more significant than comparison costs:
unique cost discrete cost
compares time compares time
partial sort 10 10000598 0.6s 10000619 1.1s
partial sort 1M 77517081 2.3s 77567396 2.7s
full sort 301036818 2.0s 302225452 2.3s
Conclusion: The longer compare time is visible, but container manipulation dominates. The total cost of ten million set inserts is certainly visible in a total of 52 seconds of compute time. The total cost of ten million vector inserts is quite a bit less noticeable.
Small note, for what it's worth:
The one thing I got from that bit of assembly code is that you're not saving anything by making the cost a float
. It's actually allocating eight bytes for the float, so you're not saving any memory, and your cpu does not do a single float comparison any faster than a single double comparison. Just sayin' (i.e., beware of premature optimization).
Downvoter, care to explain?