Fast intersection of two sorted integer arrays

后端 未结 5 653
说谎
说谎 2021-02-04 11:02

I need to find the intersection of two sorted integer arrays and do it very fast.

Right now, I am using the following code:

int i = 0, j = 0;

while (i          


        
5条回答
  •  萌比男神i
    2021-02-04 11:51

    UPDATE

    The fastest I got was 200ms with arrays size 10mil, with the unsafe version (Last piece of code).

    The test I've did:

    var arr1 = new int[10000000];
    var arr2 = new int[10000000];
    
    for (var i = 0; i < 10000000; i++)
    {
        arr1[i] = i;
        arr2[i] = i * 2;
    }
    
    var sw = Stopwatch.StartNew();
    
    var result = arr1.IntersectSorted(arr2);
    
    sw.Stop();
    
    Console.WriteLine(sw.Elapsed); // 00:00:00.1926156
    

    Full Post:

    I've tested various ways to do it and found this to be very good:

    public static List IntersectSorted(this int[] source, int[] target)
    {
        // Set initial capacity to a "full-intersection" size
        // This prevents multiple re-allocations
        var ints = new List(Math.Min(source.Length, target.Length));
    
        var i = 0;
        var j = 0;
    
        while (i < source.Length && j < target.Length)
        {
            // Compare only once and let compiler optimize the switch-case
            switch (source[i].CompareTo(target[j]))
            {
                case -1:
                    i++;
    
                    // Saves us a JMP instruction
                    continue;
                case 1:
                    j++;
    
                    // Saves us a JMP instruction
                    continue;
                default:
                    ints.Add(source[i++]);
                    j++;
    
                    // Saves us a JMP instruction
                    continue;
            }
        }
    
        // Free unused memory (sets capacity to actual count)
        ints.TrimExcess();
    
        return ints;
    }
    

    For further improvement you can remove the ints.TrimExcess();, which will also make a nice difference, but you should think if you're going to need that memory.

    Also, if you know that you might break loops that use the intersections, and you don't have to have the results as an array/list, you should change the implementation to an iterator:

    public static IEnumerable IntersectSorted(this int[] source, int[] target)
    {
        var i = 0;
        var j = 0;
    
        while (i < source.Length && j < target.Length)
        {
            // Compare only once and let compiler optimize the switch-case
            switch (source[i].CompareTo(target[j]))
            {
                case -1:
                    i++;
    
                    // Saves us a JMP instruction
                    continue;
                case 1:
                    j++;
    
                    // Saves us a JMP instruction
                    continue;
                default:
                    yield return source[i++];
                    j++;
    
                    // Saves us a JMP instruction
                    continue;
            }
        }
    }
    

    Another improvement is to use unsafe code:

    public static unsafe List IntersectSorted(this int[] source, int[] target)
    {
        var ints = new List(Math.Min(source.Length, target.Length));
    
        fixed (int* ptSrc = source)
        {
            var maxSrcAdr = ptSrc + source.Length;
    
            fixed (int* ptTar = target)
            {
                var maxTarAdr = ptTar + target.Length;
    
                var currSrc = ptSrc;
                var currTar = ptTar;
    
                while (currSrc < maxSrcAdr && currTar < maxTarAdr)
                {
                    switch ((*currSrc).CompareTo(*currTar))
                    {
                        case -1:
                            currSrc++;
                            continue;
                        case 1:
                            currTar++;
                            continue;
                        default:
                            ints.Add(*currSrc);
                            currSrc++;
                            currTar++;
                            continue;
                    }
                }
            }
        }
    
        ints.TrimExcess();
        return ints;
    }
    

    In summary, the most major performance hit was in the if-else's. Turning it into a switch-case made a huge difference (about 2 times faster).

提交回复
热议问题