C# fastest intersection of 2 sets of sorted numbers

前端 未结 5 1437
傲寒
傲寒 2020-12-28 10:00

I\'m calculating intersection of 2 sets of sorted numbers in a time-critical part of my application. This calculation is the biggest bottleneck of the whole application so I

5条回答
  •  太阳男子
    2020-12-28 10:02

    If you have two sets which are both sorted, you can implement a faster intersection than anything provided out of the box with LINQ.

    Basically, keep two IEnumerator cursors open, one for each set. At any point, advance whichever has the smaller value. If they match at any point, advance them both, and so on until you reach the end of either iterator.

    The nice thing about this is that you only need to iterate over each set once, and you can do it in O(1) memory.

    Here's a sample implementation - untested, but it does compile :) It assumes that both of the incoming sequences are duplicate-free and sorted, both according to the comparer provided (pass in Comparer.Default):

    (There's more text at the end of the answer!)

    static IEnumerable IntersectSorted(this IEnumerable sequence1,
        IEnumerable sequence2,
        IComparer comparer)
    {
        using (var cursor1 = sequence1.GetEnumerator())
        using (var cursor2 = sequence2.GetEnumerator())
        {
            if (!cursor1.MoveNext() || !cursor2.MoveNext())
            {
                yield break;
            }
            var value1 = cursor1.Current;
            var value2 = cursor2.Current;
    
            while (true)
            {
                int comparison = comparer.Compare(value1, value2);
                if (comparison < 0)
                {
                    if (!cursor1.MoveNext())
                    {
                        yield break;
                    }
                    value1 = cursor1.Current;
                }
                else if (comparison > 0)
                {
                    if (!cursor2.MoveNext())
                    {
                        yield break;
                    }
                    value2 = cursor2.Current;
                }
                else
                {
                    yield return value1;
                    if (!cursor1.MoveNext() || !cursor2.MoveNext())
                    {
                        yield break;
                    }
                    value1 = cursor1.Current;
                    value2 = cursor2.Current;
                }
            }
        }
    }
    

    EDIT: As noted in comments, in some cases you may have one input which is much larger than the other, in which case you could potentially save a lot of time using a binary search for each element from the smaller set within the larger set. This requires random access to the larger set, however (it's just a prerequisite of binary search). You can even make it slightly better than a naive binary search by using the match from the previous result to give a lower bound to the binary search. So suppose you were looking for values 1000, 2000 and 3000 in a set with every integer from 0 to 19,999. In the first iteration, you'd need to look across the whole set - your starting lower/upper indexes would be 0 and 19,999 respectively. After you'd found a match at index 1000, however, the next step (where you're looking for 2000) can start with a lower index of 2000. As you progress, the range in which you need to search gradually narrows. Whether or not this is worth the extra implementation cost or not is a different matter, however.

提交回复
热议问题