Longest equally-spaced subsequence

前端 未结 10 1680
遥遥无期
遥遥无期 2020-12-22 19:12

I have a million integers in sorted order and I would like to find the longest subsequence where the difference between consecutive pairs is equal. For example



        
10条回答
  •  醉话见心
    2020-12-22 19:55

    Traverse the array, keeping a record of the optimal result/s and a table with

    (1) index - the element difference in the sequence,
    (2) count - number of elements in the sequence so far, and
    (3) the last recorded element.

    For each array element look at the difference from each previous array element; if that element is last in a sequence indexed in the table, adjust that sequence in the table, and update the best sequence if applicable, otherwise start a new sequence, unless the current max is greater than the length of the possible sequence.

    Scanning backwards we can stop our scan when d is greater than the middle of the array's range; or when the current max is greater than the length of the possible sequence, for d greater than the largest indexed difference. Sequences where s[j] is greater than the last element in the sequence are deleted.

    I converted my code from JavaScript to Python (my first python code):

    import random
    import timeit
    import sys
    
    #s = [1,4,5,7,8,12]
    #s = [2, 6, 7, 10, 13, 14, 17, 18, 21, 22, 23, 25, 28, 32, 39, 40, 41, 44, 45, 46, 49, 50, 51, 52, 53, 63, 66, 67, 68, 69, 71, 72, 74, 75, 76, 79, 80, 82, 86, 95, 97, 101, 110, 111, 112, 114, 115, 120, 124, 125, 129, 131, 132, 136, 137, 138, 139, 140, 144, 145, 147, 151, 153, 157, 159, 161, 163, 165, 169, 172, 173, 175, 178, 179, 182, 185, 186, 188, 195]
    #s = [0, 6, 7, 10, 11, 12, 16, 18, 19]
    
    m = [random.randint(1,40000) for r in xrange(20000)]
    s = list(set(m))
    s.sort()
    
    lenS = len(s)
    halfRange = (s[lenS-1] - s[0]) // 2
    
    while s[lenS-1] - s[lenS-2] > halfRange:
        s.pop()
        lenS -= 1
        halfRange = (s[lenS-1] - s[0]) // 2
    
    while s[1] - s[0] > halfRange:
        s.pop(0)
        lenS -=1
        halfRange = (s[lenS-1] - s[0]) // 2
    
    n = lenS
    
    largest = (s[n-1] - s[0]) // 2
    #largest = 1000 #set the maximum size of d searched
    
    maxS = s[n-1]
    maxD = 0
    maxSeq = 0
    hCount = [None]*(largest + 1)
    hLast = [None]*(largest + 1)
    best = {}
    
    start = timeit.default_timer()
    
    for i in range(1,n):
    
        sys.stdout.write(repr(i)+"\r")
    
        for j in range(i-1,-1,-1):
            d = s[i] - s[j]
            numLeft = n - i
            if d != 0:
                maxPossible = (maxS - s[i]) // d + 2
            else:
                maxPossible = numLeft + 2
            ok = numLeft + 2 > maxSeq and maxPossible > maxSeq
    
            if d > largest or (d > maxD and not ok):
                break
    
            if hLast[d] != None:
                found = False
                for k in range (len(hLast[d])-1,-1,-1):
                    tmpLast = hLast[d][k]
                    if tmpLast == j:
                        found = True
                        hLast[d][k] = i
                        hCount[d][k] += 1
                        tmpCount = hCount[d][k]
                        if tmpCount > maxSeq:
                            maxSeq = tmpCount
                            best = {'len': tmpCount, 'd': d, 'last': i}
                    elif s[tmpLast] < s[j]:
                        del hLast[d][k]
                        del hCount[d][k]
                if not found and ok:
                    hLast[d].append(i)
                    hCount[d].append(2)
            elif ok:
                if d > maxD: 
                    maxD = d
                hLast[d] = [i]
                hCount[d] = [2]
    
    
    end = timeit.default_timer()
    seconds = (end - start)
    
    #print (hCount)
    #print (hLast)
    print(best)
    print(seconds)
    

提交回复
热议问题