C# yield return performance

问题

How much space is reserved to the underlying collection behind a method using yield return syntax WHEN I PERFORM a ToList() on it? There's a chance it will reallocate and thus decrease performance if compared to the standard approach where i create a list with predefined capacity?

The two scenarios:

    public IEnumerable<T> GetList1()
    {
        foreach( var item in collection )
            yield return item.Property;
    }

    public IEnumerable<T> GetList2()
    {
        List<T> outputList = new List<T>( collection.Count() );
        foreach( var item in collection )
            outputList.Add( item.Property );

        return outputList;
    }

回答1:

yield return does not create an array that has to be resized, like what List does; instead, it creates an IEnumerable with a state machine.

For instance, let's take this method:

public static IEnumerable<int> Foo()
{
    Console.WriteLine("Returning 1");
    yield return 1;
    Console.WriteLine("Returning 2");
    yield return 2;
    Console.WriteLine("Returning 3");
    yield return 3;
}

Now let's call it and assign that enumerable to a variable:

var elems = Foo();

None of the code in Foo has executed yet. Nothing will be printed on the console. But if we iterate over it, like this:

foreach(var elem in elems)
{
    Console.WriteLine( "Got " + elem );
}

On the first iteration of the foreach loop, the Foo method will be executed until the first yield return. Then, on the second iteration, the method will "resume" from where it left off (right after the yield return 1), and execute until the next yield return. Same for all subsequent elements.
At the end of the loop, the console will look like this:

Returning 1
Got 1
Returning 2
Got 2
Returning 3
Got 3

This means you can write methods like this:

public static IEnumerable<int> GetAnswers()
{
    while( true )
    {
        yield return 42;
    }
}

You can call the GetAnswers method, and every time you request an element, it'll give you 42; the sequence never ends. You couldn't do this with a List, because lists have to have a finite size.

回答2:

How much space is reserved to the underlying collection behind a method using yield return syntax?

There's no underlying collection.

There's an object, but it isn't a collection. Just how much space it will take up depends on what it needs to keep track of.

There's a chance it will reallocate

No.

And thus decrease performance if compared to the standard approach where i create a list with predefined capacity?

It will almost certainly take up less memory than creating a list with a predefined capacity.

Let's try a manual example. Say we had the following code:

public static IEnumerable<int> CountToTen()
{
  for(var i = 1; i != 11; ++i)
    yield return i;
}

To foreach through this will iterate through the numbers 1 to 10 inclusive.

Now let's do this the way we would have to if yield did not exist. We'd do something like:

private class CountToTenEnumerator : IEnumerator<int>
{
  private int _current;
  public int Current
  {
    get
    {
      if(_current == 0)
        throw new InvalidOperationException();
      return _current;
    }
  }
  object IEnumerator.Current
  {
    get { return Current; }
  }
  public bool MoveNext()
  {
    if(_current == 10)
      return false;
    _current++;
    return true;
  }
  public void Reset()
  {
    throw new NotSupportedException();
    // We *could* just set _current back, but the object produced by
    // yield won't do that, so we'll match that.
  }
  public void Dispose()
  {
  }
}
private class CountToTenEnumerable : IEnumerable<int>
{
  public IEnumerator<int> GetEnumerator()
  {
    return new CountToTenEnumerator();
  }
  IEnumerator IEnumerable.GetEnumerator()
  {
    return GetEnumerator();
  }
}
public static IEnumerable<int> CountToTen()
{
  return new CountToTenEnumerable();
}

Now, for a variety of reasons this is quite different to the code you're likely to get from the version using yield, but the basic principle is the same. As you can see there are two allocations involved of objects (same number as if we had a collection and then did a foreach on that) and the storage of a single int. In practice we can expect yield to store a few more bytes than that, but not a lot.

Edit: yield actually does a trick where the first GetEnumerator() call on the same thread that obtained the object returns that same object, doing double service for both cases. Since this covers over 99% of use cases yield actually does one allocation rather than two.

Now let's look at:

public IEnumerable<T> GetList1()
{
  foreach( var item in collection )
    yield return item.Property;
}

While this would result in more memory used than just return collection, it won't result in a lot more; the only thing the enumerator produced really needs to keep track of is the enumerator produced by calling GetEnumerator() on collection and then wrapping that.

This is going to be massively less memory than that of the wasteful second approach you mention, and much faster to get going.

Edit:

You've changed your question to include "syntax WHEN I PERFORM a ToList() on it", which is worth considering.

Now, here we need to add a third possibility: Knowledge of the collection's size.

Here, there is the possibilty that using new List(capacity) will prevent allocations of the list being built. That can indeed be a considerable saving.

If the object that has ToList called on it implements ICollection<T> then ToList will end up first doing a single allocation of an internal array of T and then calling ICollection<T>.CopyTo().

This would mean that your GetList2 would result in a faster ToList() than your GetList1.

However, your GetList2 has already wasted time and memory doing what ToList() will do with the results of GetList1 anyway!

What it should have done here was just return new List<T>(collection); and be done with it.

If though we need to actually do something inside GetList1 or GetList2 (e.g. convert elements, filter elements, track averages, and so on) then GetList1 is going to be faster and lighter on memory. Much lighter if we never call ToList() on it, and slightly ligher if we do call ToList() because again, the faster and lighter ToList() is offset by GetList2 being slower and heavier in the first place by exactly the same amount.

来源：https://stackoverflow.com/questions/29702468/c-sharp-yield-return-performance

标签

memory

return

yield