Longest equally-spaced subsequence

前端 未结 10 1669
遥遥无期
遥遥无期 2020-12-22 19:12

I have a million integers in sorted order and I would like to find the longest subsequence where the difference between consecutive pairs is equal. For example



        
10条回答
  •  死守一世寂寞
    2020-12-22 19:39

    This is my 2 cents.

    If you have a list called input:

    input = [1, 4, 5, 7, 8, 12]
    

    You can build a data structure that for each one of this points (excluding the first one), will tell you how far is that point from anyone of its predecessors:

    [1, 4, 5, 7, 8, 12]
     x  3  4  6  7  11   # distance from point i to point 0
     x  x  1  3  4   8   # distance from point i to point 1
     x  x  x  2  3   7   # distance from point i to point 2
     x  x  x  x  1   5   # distance from point i to point 3
     x  x  x  x  x   4   # distance from point i to point 4
    

    Now that you have the columns, you can consider the i-th item of input (which is input[i]) and each number n in its column.

    The numbers that belong to a series of equidistant numbers that include input[i], are those which have n * j in the i-th position of their column, where j is the number of matches already found when moving columns from left to right, plus the k-th predecessor of input[i], where k is the index of n in the column of input[i].

    Example: if we consider i = 1, input[i] = 4, n = 3, then, we can identify a sequence comprehending 4 (input[i]), 7 (because it has a 3 in position 1 of its column) and 1, because k is 0, so we take the first predecessor of i.

    Possible implementation (sorry if the code is not using the same notation as the explanation):

    def build_columns(l):
        columns = {}
        for x in l[1:]:
            col = []
            for y in l[:l.index(x)]:
                col.append(x - y)
            columns[x] = col
        return columns
    
    def algo(input, columns):
        seqs = []
        for index1, number in enumerate(input[1:]):
            index1 += 1 #first item was sliced
            for index2, distance in enumerate(columns[number]):
                seq = []
                seq.append(input[index2]) # k-th pred
                seq.append(number)
                matches = 1
                for successor in input[index1 + 1 :]:
                    column = columns[successor]
                    if column[index1] == distance * matches:
                        matches += 1
                        seq.append(successor)
                if (len(seq) > 2):
                    seqs.append(seq)
        return seqs
    

    The longest one:

    print max(sequences, key=len)
    

提交回复
热议问题