Where is the performance gain of the erase-remove idiom coming from

问题

I need to erase all elements from a vector which fulfill a certain criteria.

My first approach would be to loop through the vector and call vector::erase on all elements which fulfill the criteria.

As far as I understand, vector::erase has a bad performance for this use case, because it removes the item from the underlying array, and moves the rest of the vector forward by one element (or more if you erase a range of elements). When you remove multiple elements, the rear elements will be shifted at each removal.

The remove algorithm takes all the elements to be removed, and moves them to the end of the vector, so you only have to remove that rear part of the vector, which involves no shifting.

But why is this faster than erasing? (is it even faster?)

Doesn't moving an element to the end imply moving all the following elements forward like in vector::erase?

How comes, that remove only has a complexity of O(n)?

回答1:

The performance issue here is not about erasing the elements that are to be removed, or moving them to the end (which does not happen actually), but about moving the elements that are to be kept.

If you use erase on each element you want to remove, you need to move all the elements after these... for each call to erase. Typically, if you want to remove k elements, you will move the elements after the latest one (in the vector) k times instead of only one.

But if you call remove, you will only move these once (see example below).

A small example to better understand how these two methods work:

Let's say you have a vector of size 1000 and the elements you want to remove are at position 17 and 37.

With erase acting on the two elements to be removed:

when you call erase() for the 17th element, you need to move elements 18 to 999, 982 elements.
when you call erase() for the 36th element (it's the 36th one now!), you need to move elements 37 to 998, 962 elements.

In total, you have moved 962 + 982 = 1944 elements, 962 of these have been moved twice for nothing.

With remove, what happens is as follows:

element 0 does not change;
element 1 does not change;
...
element 17 is "discarded";
element 18 is moved at position 17;
element 19 is moved at position 18;
...
element 36 is moved at position 35;
element 37 is "discarded";
element 38 is moved at position 36;
...
element 999 is moved at position 997.

In total, you have moved 998 elements (1000 minus the two you removed), which is much better than the 1943 elements of the previous methods. This is even better if you have more than 2 elements to remove.

You can have a look at the possible implementation on en.cppreference.com to better understand how std::remove works.

回答2:

The advantage comes in that std::remove doesn't just remove one element at a time. For example, if a call to std::remove results in removing the first 10 elements of your vector, it will move the 11th element directly to the 1st position, 12th element directly to the 2nd position etc... Whereas, if you erased the first 10 elements one at a time, it would move every element after the one you erase back by 1. And then you would erase the next one, every element would have to be moved again. And this would repeat for every element erased.

Also, the removed elements don't have to be sequential for this advantage to happen. For example, if the call to remove results in every other element, starting at the first, being removed. First, the 2nd element will be moved to the 1st position, and this will leave a gap of two elements until the next keepable element. Then the 4th element will be moved directly to the 2nd position, leaving a gap of 3 elements and so on.

Also, a slight correction:

The remove algorithm takes all the elements to be removed, and moves them to the end of the vector

The remove algorithm doesn't do that. It doesn't care what happens to the elements that are to be removed. They are simply replaced by the elements that are to remain. The value of the elements at the end after a call to remove are unspecified. The algorithm you're describing is partition (with a reversed comparison function).

来源：https://stackoverflow.com/questions/45551241/where-is-the-performance-gain-of-the-erase-remove-idiom-coming-from

标签

c++

vector

stl

erase-remove-idiom