问题
Edit: I will add some benchmark results. To about a 1000 - 5000 items in the list, IList
and RemoveAt
beats ISet
and Remove
, but that's not something to worry about since the differences are marginal. The real fun begins when collection size extends to 10000 and more. I'm posting only those data
I was answering a question here last night and faced a bizarre situation.
First a set of simple methods:
static Random rnd = new Random();
public static int GetRandomIndex<T>(this ICollection<T> source)
{
return rnd.Next(source.Count);
}
public static T GetRandom<T>(this IList<T> source)
{
return source[source.GetRandomIndex()];
}
------------------------------------------------------------------------------------------------------------------------------------
Let's say I'm removing N number of items from a collection randomly. I would write this function:
public static void RemoveRandomly1<T>(this ISet<T> source, int countToRemove)
{
int countToRemain = source.Count - countToRemove;
var inList = source.ToList();
int i = 0;
while (source.Count > countToRemain)
{
source.Remove(inList.GetRandom());
i++;
}
}
or
public static void RemoveRandomly2<T>(this IList<T> source, int countToRemove)
{
int countToRemain = source.Count - countToRemove;
int j = 0;
while (source.Count > countToRemain)
{
source.RemoveAt(source.GetRandomIndex());
j++;
}
}
As you can see the first function is written for an ISet
and the second for normal IList
. In the first function I'm removing by item from ISet
and by index in IList
, both of which I believe are O(1)
. Why is the second function performing so much worse than the first, especially when the lists get bigger?
Odds (my take):
1) In the first function the ISet
is converted to an IList
(to get the random item from the IList
), where as there is no such thing performed in the second function.
Advantage IList.
2) In the first function a call to GetRandomItem
is made, where as in the second, a call to GetRandomIndex
is made, that's one step less again.
Though trivial, advantage IList.
3) In the first function, the random item is got from a separate list, so the obtained item might be already removed from ISet
. This leads in more iterations in the while
loop in the first function. In the second function, the random index is got from the source that is being iterated on, hence there are never repetitive iterations. I have tested this and verified this.
i > j always, advantage
IList
.
I thought the reason for this behaviour is that a List
would need constant resizing when items are added or removed. But apparently no in some other testing. I ran:
public static void Remove1(this ISet<int> set)
{
int count = set.Count;
for (int i = 0; i < count; i++)
{
set.Remove(i + 1);
}
}
public static void Remove2(this IList<int> lst)
{
for (int i = lst.Count - 1; i >= 0; i--)
{
lst.RemoveAt(i);
}
}
and found that the second function runs faster.
Test bed:
var f = Enumerable.Range(1, 100000);
var s = new HashSet<int>(f);
var l = new List<int>(f);
Benchmark(() =>
{
//some examples...
s.RemoveRandomly1(2500);
l.RemoveRandomly2(2500);
s.Remove1();
l.Remove2();
}, 1);
public static void Benchmark(Action method, int iterations = 10000)
{
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i < iterations; i++)
method();
sw.Stop();
MsgBox.ShowDialog(sw.Elapsed.TotalMilliseconds.ToString());
}
Just trying to know what's with the two structures.. Thanks..
Result:
var f = Enumerable.Range(1, 10000);
s.RemoveRandomly1(7500); => 5ms
l.RemoveRandomly2(7500); => 20ms
var f = Enumerable.Range(1, 100000);
s.RemoveRandomly1(7500); => 7ms
l.RemoveRandomly2(7500); => 275ms
var f = Enumerable.Range(1, 1000000);
s.RemoveRandomly1(75000); => 50ms
l.RemoveRandomly2(75000); => 925000ms
For most typical needs a list would do though..!
回答1:
First off, IList and ISet aren't implementations of anything. I can write an IList or an ISet implementation that will run very differently, so the concrete implementations are what is important (List and HashSet in your case).
Accessing a List item by index is O(1) but not removing by RemoveAt which is O(n).
List removing from the end will be fast because it doesn't have to copy anything, it just decrements its internal counter that stores how many items it has until the number of empty spots in the underlying array goes below a threshold, at which point it will copy the array to a smaller one. Once you hit the max capacity of the underlying array it creates a new array double the size and copies the elements over. If you go below a certain threshold it will create an array half the size and copy the elements over. It tracks how large it is with a length property, so that unused slots appear like they aren't there.
Randomly removing from a list means that it will have to copy all the array entries that come after the index so that they slide down one spot, which is inherently pretty slow, particularly as the size of the list gets bigger. If you have a List with 1 million entries, and you remove something at index 500,000, it has to copy the second half of the array down a spot.
来源:https://stackoverflow.com/questions/13544089/why-is-removing-by-index-from-an-ilist-performing-so-much-worse-than-removing-by