LINQ to Objects and improved perf with an Index?

随声附和 提交于 2019-12-03 07:49:25

There's already a project which I believe does exactly that - i4o. I can't say I've used it myself, but it sounds like the kind of thing you want... you may need to juggle your existing code a bit, but it's certainly worth looking at.

If that doesn't help, you could at least write your own extension methods on SortedList<TKey, TValue>. You probably wouldn't be able to easily use your actual where clause, but you could use your own methods taking a minimum and a maximum value. You might also want to make them apply to IList<T> where you assert that you've already sorted the values appropriately (according to some comparer).

For example (completely untested):

public static IEnumerable<T> Between<T, TKey>(this IList<T> source,
                                              Func<T, TKey> projection,
                                              TKey minKeyInclusive,
                                              TKey maxKeyExclusive,
                                              IComparer<TKey> comparer)
{
    comparer = comparer ?? Comparer<TKey>.Default;

    // TODO: Find the index of the lower bound via a binary search :)
    // (It's too late for me to jot it down tonight :)
    int index = ...; // Find minimum index

    while (index < source.Count &&
           comparer.Compare(projection(source[index]), maxKeyExclusive) < 0)
    {
        yield return source[index];
        index++;
    }
}

(If you only have List<T> instead of IList<T>, you could use List<T>.BinarySearch, although you'd need to build a custom IComparer<T>.)

Also, have a look at SortedSet<T> in .NET 4.

You're right that the query you wrote will enumerate the whole list as obviously LINQ can't assume anything about your data.

If you have a SortedList, you can exploit that using the SkipWhile/TakeWhile linq methods:

 var x = x.SkipWhile(kv => kv.Key <= 18).TakeWhile(kv => kv.Key < 21)

EDIT

@Davy8 is right of course that worst case this still has the same performance. See the other answers for a way to more quickly find the first value.

If you need to do this operation more than once for different age ranges then you can probably also speed it up by grouping on age:

var byAge = people.GroupBy(p => p.Age);

var x = from grp in byAge 
        where grp.Key > 18 && grp.Key < 21
        from person in grp
        select person;

The LINQ query syntax actually uses any extension method named Where that matches the signature, so you can always write your own that handles your specific type the way you want.

    public static IEnumerable<Person> Where(this IEnumerable<Person> collection, Func<Person, bool> condition )
    {
        Console.WriteLine("My Custom 'Where' method called");
        return System.Linq.Enumerable.Where(collection, condition);
    }

...

        var x = from t in people
                where t.Age > 18 && t.Age < 21
                select t; //Will print "My Custom 'Where' method called"

Then you can apply any logic you want. I believe the normal method overload rules apply for determining which Where extension method would be called.

In a pre-sorted container, the efficiency is achieved by finding the first element quickly. Once you find the first element, just linearly retrieve the following elements until you find the end of your range.

Assuming your SortedList is sorted by Person.Age, you can find the first element of the range using SortedList.IndexOfKey, which is a binary search algorithm; therefore, this method is an O(log n) operation.

(I don't think you can change your code so the Enumerable.Where suddenly becomes more intelligent and finds the range start by using binary search.)

--- EDIT ---

Actually, what you really need is List.BinarySearch or Array.BinarySearch. The SortedList.IndexOfKey won't let you get the index of the closest match in case exact match does not exist. Or you can just implement the binary search yourself.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!