Is the order of execution of Linq the reason for this catch?

放肆的年华 提交于 2019-12-11 05:56:15

问题


I have this function to repeat a sequence:

public static List<T> Repeat<T>(this IEnumerable<T> lst, int count)
{
    if (count < 0)
        throw new ArgumentOutOfRangeException("count");

    var ret = Enumerable.Empty<T>();

    for (var i = 0; i < count; i++)
        ret = ret.Concat(lst);

    return ret.ToList();
}

Now if I do:

var d = Enumerable.Range(1, 100);
var f = d.Select(t => new Person()).Repeat(10); 
int i = f.Distinct().Count();

I expect i to be 100, but its giving me 1000! My question strictly is why is this happening? Shouldn't Linq be smart enough to figure out that it's the first selected 100 persons I need to concatenate with variable ret? I'm getting a feeling that here the Concat is being given preference when it's used with a Select when its executed at ret.ToList()..

Edit:

If I do this I get the correct result as expected:

var f = d.Select(t => new Person()).ToList().Repeat(10); 
int i = f.Distinct().Count(); //prints 100

Edit again:

I have not overridden Equals. I'm just trying to get 100 unique persons (by reference of course). My question is can someone elucidate to me why is Linq not doing the select operation first and then concatenation (of course at the time of execution)?


回答1:


The problem is that unless you call ToList, the d.Select(t => new Person()) is re-enumerated each time the Repeat goes through the list, creating duplicate Persons. The technique is known as the deferred execution.

In general, LINQ does not assume that each time it enumerates a sequence it would get the same sequence, or even a sequence of the same length. If this effect is not desirable, you can always "materialize" the sequence inside your Repeat method by calling ToList right away, like this:

public static List<T> Repeat<T>(this IEnumerable<T> lstEnum, int count) {
    if (count < 0)
        throw new ArgumentOutOfRangeException("count");

    var lst = lstEnum.ToList(); // Enumerate only once
    var ret = Enumerable.Empty<T>();

    for (var i = 0; i < count; i++)
        ret = ret.Concat(lst);

    return ret.ToList();
}



回答2:


I could break down my problem to something less trivial:

var d = Enumerable.Range(1, 100);
var f = d.Select(t => new Person());

Now essentially I am doing this:

f = f.Concat(f);

Mind you query hasn't been executed till now. At the time of execution f is still d.Select(t => new Person()) unexecuted. So the last statement at the time of execution can broken down to:

f = f.Concat(f); 
//which is 
f = d.Select(t => new Person()).Concat(d.Select(t => new Person()));

which is obvious to create 100 + 100 = 200 new instances of persons. So

f.Distinct().ToList(); //yields 200, not 100

which is the correct behaviour.

Edit: I could rewrite the extension method as simple as,

public static IEnumerable<T> Repeat<T>(this IEnumerable<T> source, int times)
{
    source = source.ToArray();
    return Enumerable.Range(0, times).SelectMany(_ => source);
}

I used dasblinkenlight's suggestion to fix the issue.




回答3:


Each Person object is a separate object. All 1000 are distinct.

What is the definition of equality for the Person type? If you don't override it, that definition will be reference equality, meaning all 1000 objects are distinct.



来源:https://stackoverflow.com/questions/13500641/is-the-order-of-execution-of-linq-the-reason-for-this-catch

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!