LINQ performance - deferred v/s immediate execution

谁说我不能喝 提交于 2021-02-07 08:53:55

问题


I have seen that sometimes the performance of LINQ to Objects queries can be improved significantly if they forced to execute immediately by using .ToArray(), but can't quite understand why. For example, in the sample below, the execution of the function Deferred() is much slower than the function Immediate(), and grows exponentially with the value of limit (perhaps it was exponential in both functions, but execution time with Immediate() was too little for me to say definitively).

public void Deferred()
{
    var all = Range(limit);
    var even = from e in EvenRange(limit) where all.Contains(e) select e;
    var odd = from o in OddRange(limit) where !even.Contains(o) select o;

    var query = from q in odd select q;

    foreach(var i in query) { var j = i+1; }
}

public void Immediate()
{
    var all = Range(limit);
    var even = (from e in EvenRange(limit) where all.Contains(e) select e)
        .ToArray();
    var odd = (from o in OddRange(limit) where !even.Contains(o) select o)
        .ToArray();

    var query = (from q in odd select q).ToArray();

    foreach(var i in query) { var j = i+1; }
}

public static IEnumerable<int> OddRange(int stop)
{
    for (int i = 1; i < stop; i+=2) yield return i;
}

public static IEnumerable<int> EvenRange(int stop)
{
    for (int i = 2; i < stop; i+=2) yield return i;
}

public static IEnumerable<int> Range(int stop)
{
    for (int i = 0; i < stop; ++i) yield return i;
}

What causes this slowdown? Is it just the extra code that the compiler generates and has to run every time I access the enumerable, or does Deferred() also go into some extra loops, changing the "Big O" complexity of the query?


回答1:


As far as I know IEnumerable doesn't cache the result, so your clause involving even.Contains (which must check all elements of even) forces a complete re-enumeration of even, thus showing the performance behavior you've noticed. Transforming even in an array or in a List should show you good performance when enumerating odd. Other .NET languages (such as F#) include methods to cache the results of an enumeration (if you're interested, you may look at F#'s Seq.cache method)




回答2:


As emaster70 explained, the Deferred version recalcuates the even collection for every call to even.Contains. That makes it degrade very quickly for higher values of limit.

The Immediate version is better, and if you also make the all collection immediate, like this:

var all = Range(limit).ToArray();

it gets about three times faster.

However, the Contains method for an array is an O(n) operation, so that doesn't scale well either. If the condition is equality, you should use a join instead of looking up each value using Contains.

If you need to make the lookup, you should use a HashSet instead of an array, as the Contains method for that is an O(1) operation.

For a limit of 10000, this is 100 times faster than the Immediate version:

public void Joined() {
    var all = Range(limit);
    var even = from e in EvenRange(limit) join a in all on e equals a select e;
    var evenSet = new HashSet<int>(even);
    var odd = from o in OddRange(limit) where !evenSet.Contains(o) select o;

    var query = from q in odd select q;

    foreach (var i in query) { var j = i + 1; }
}



回答3:


In addition to what emaster70 said, when reading values from an array an entire cache line is fetched into the system cache meaning that when looking into the arrays it goes much faster than if you had to fetch from main memory.




回答4:


In this code

var all = Range(limit);    
var even = from e in EvenRange(limit) where all.Contains(e) select e;    
var odd = from o in OddRange(limit) where !even.Contains(o) select o;

since IEnumerables constructed from iterator blocks or LINQ queries are lazy, computing each element of 'odd' requires recomputing the entire 'even' sequence. As a result, the big-O of this code is awful. It may be instructive to use a very small 'limit' and put a Console.WriteLine() inside the loop in Range(), and watch the behavior.



来源:https://stackoverflow.com/questions/872697/linq-performance-deferred-v-s-immediate-execution

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!