std:sort vs inserting into an std::set

问题

I am reading some line segments from cin. Each line segment is represented by a start and end point. 2D. X and Y.

The input is not sorted. It is in random order. (Update:But I need them sorted first by X and then by Y)

I can read in all the segments, store them in a vector and then call std::sort. On the other hand, I can create an empty std::set and insert each segment as it arrives. The set will automatically maintain sorted order. Which of the two approaches is more efficient?

Update: The total size of the input (number of segments) is known in advance.

回答1:

You should measure the performance of both approaches to be sure, but it's a safe bet to assume that std::sort on an std::vector is way faster than inserting into an std::set due to locality effects and the large constants hiding in the tree insertion algorithm. Also, the subsequent lookups and iteration will be faster.

(However, std::set is better suited for supporting a mixed series of insertions and deletions/lookups/iterations. Maintaining order in vector is expensive, as each insertion will take linear time on average.)

回答2:

As a good rule of thumb, the stricter guarantees are offered, the worse performance you'll get.

Inserting into a std::set guarantees that the sequence is sorted after every insertion.

Inserting into a std::vector, and calling std::sort once after all insertions have been done guarantees that the sequence is sorted once all manipulations on the vector have been done. It doesn't require the vector to be sorted during all the intermediate insertions.

A std::vector also exhibits better spatial locality, and requires fewer memory allocations. So I would assume the vector approach to be faster, but if performance matters to you, then it matters enough to be measured.

If you don't care to measure what is faster in your case for your data sets with your code in your application, then you don't care which is faster.

回答3:

Use the container that has the appropriate semantics for your needs. Efficiency generally follows on automatically from that choice.

If you then experience performance bottlenecks, do some benchmarking.

回答4:

It indeed does depend, but it's certain that std::set is intended for random inserts and deletes. In this case you are only inserting. Go with std::vector. Also, perhaps more importantly, if you know beforehand how many segments there are, you only have to allocate the vector only once, it will not reallocate memory everytime it doubles in size.

来源：https://stackoverflow.com/questions/15638024/stdsort-vs-inserting-into-an-stdset

标签

c++

sorting

stl

set

std