How to find pair with kth largest sum?

前端 未结 6 880
深忆病人
深忆病人 2020-12-02 12:29

Given two sorted arrays of numbers, we want to find the pair with the kth largest possible sum. (A pair is one element from the first array and one element from the second

6条回答
  •  没有蜡笔的小新
    2020-12-02 13:11

    tl;dr: If you look ahead and look behind at each iteration, you can start with the end (which is highest) and work back in O(K) time.

    Although the insight underlying this approach is, I believe, sound, the code below is not quite correct at present (see comments).


    Let's see: first of all, the arrays are sorted. So, if the arrays are a and b with lengths M and N, and as you have arranged them, the largest items are in slots M and N respectively, the largest pair will always be a[M]+b[N].

    Now, what's the second largest pair? It's going to have perhaps one of {a[M],b[N]} (it can't have both, because that's just the largest pair again), and at least one of {a[M-1],b[N-1]}. BUT, we also know that if we choose a[M-1]+b[N-1], we can make one of the operands larger by choosing the higher number from the same list, so it will have exactly one number from the last column, and one from the penultimate column.

    Consider the following two arrays: a = [1, 2, 53]; b = [66, 67, 68]. Our highest pair is 53+68. If we lose the smaller of those two, our pair is 68+2; if we lose the larger, it's 53+67. So, we have to look ahead to decide what our next pair will be. The simplest lookahead strategy is simply to calculate the sum of both possible pairs. That will always cost two additions, and two comparisons for each transition (three because we need to deal with the case where the sums are equal);let's call that cost Q).

    At first, I was tempted to repeat that K-1 times. BUT there's a hitch: the next largest pair might actually be the other pair we can validly make from {{a[M],b[N]}, {a[M-1],b[N-1]}. So, we also need to look behind.

    So, let's code (python, should be 2/3 compatible):

    def kth(a,b,k):
        M = len(a)
        N = len(b)
        if k > M*N:
           raise ValueError("There are only %s possible pairs; you asked for the %sth largest, which is impossible" % M*N,k)
        (ia,ib) = M-1,N-1 #0 based arrays
        # we need this for lookback
        nottakenindices = (0,0) # could be any value
        nottakensum = float('-inf')
        for i in range(k-1):
            optionone = a[ia]+b[ib-1]
            optiontwo = a[ia-1]+b[ib]
            biggest = max((optionone,optiontwo))
            #first deal with look behind
            if nottakensum > biggest:
               if optionone == biggest:
                   newnottakenindices = (ia,ib-1)
               else: newnottakenindices = (ia-1,ib)
               ia,ib = nottakenindices
               nottakensum = biggest
               nottakenindices = newnottakenindices
            #deal with case where indices hit 0
            elif ia <= 0 and ib <= 0:
                 ia = ib = 0
            elif ia <= 0:
                ib-=1
                ia = 0
                nottakensum = float('-inf')
            elif ib <= 0:
                ia-=1
                ib = 0
                nottakensum = float('-inf')
            #lookahead cases
            elif optionone > optiontwo: 
               #then choose the first option as our next pair
               nottakensum,nottakenindices = optiontwo,(ia-1,ib)
               ib-=1
            elif optionone < optiontwo: # choose the second
               nottakensum,nottakenindices = optionone,(ia,ib-1)
               ia-=1
            #next two cases apply if options are equal
            elif a[ia] > b[ib]:# drop the smallest
               nottakensum,nottakenindices = optiontwo,(ia-1,ib)
               ib-=1
            else: # might be equal or not - we can choose arbitrarily if equal
               nottakensum,nottakenindices = optionone,(ia,ib-1)
               ia-=1
            #+2 - one for zero-based, one for skipping the 1st largest 
            data = (i+2,a[ia],b[ib],a[ia]+b[ib],ia,ib)
            narrative = "%sth largest pair is %s+%s=%s, with indices (%s,%s)" % data
            print (narrative) #this will work in both versions of python
            if ia <= 0 and ib <= 0:
               raise ValueError("Both arrays exhausted before Kth (%sth) pair reached"%data[0])
        return data, narrative
    

    For those without python, here's an ideone: http://ideone.com/tfm2MA

    At worst, we have 5 comparisons in each iteration, and K-1 iterations, which means that this is an O(K) algorithm.

    Now, it might be possible to exploit information about differences between values to optimise this a little bit, but this accomplishes the goal.


    Here's a reference implementation (not O(K), but will always work, unless there's a corner case with cases where pairs have equal sums):

    import itertools
    def refkth(a,b,k):
        (rightia,righta),(rightib,rightb) = sorted(itertools.product(enumerate(a),enumerate(b)), key=lamba((ia,ea),(ib,eb):ea+eb)[k-1]
        data = k,righta,rightb,righta+rightb,rightia,rightib
        narrative = "%sth largest pair is %s+%s=%s, with indices (%s,%s)" % data
        print (narrative) #this will work in both versions of python
        return data, narrative
    

    This calculates the cartesian product of the two arrays (i.e. all possible pairs), sorts them by sum, and takes the kth element. The enumerate function decorates each item with its index.

提交回复
热议问题