I have a million integers in sorted order and I would like to find the longest subsequence where the difference between consecutive pairs is equal. For example
Traverse the array, keeping a record of the optimal result/s and a table with
(1) index - the element difference in the sequence,
(2) count - number of elements in the sequence so far, and
(3) the last recorded element.
For each array element look at the difference from each previous array element; if that element is last in a sequence indexed in the table, adjust that sequence in the table, and update the best sequence if applicable, otherwise start a new sequence, unless the current max is greater than the length of the possible sequence.
Scanning backwards we can stop our scan when d is greater than the middle of the array's range; or when the current max is greater than the length of the possible sequence, for d greater than the largest indexed difference. Sequences where s[j] is greater than the last element in the sequence are deleted.
I converted my code from JavaScript to Python (my first python code):
import random
import timeit
import sys
#s = [1,4,5,7,8,12]
#s = [2, 6, 7, 10, 13, 14, 17, 18, 21, 22, 23, 25, 28, 32, 39, 40, 41, 44, 45, 46, 49, 50, 51, 52, 53, 63, 66, 67, 68, 69, 71, 72, 74, 75, 76, 79, 80, 82, 86, 95, 97, 101, 110, 111, 112, 114, 115, 120, 124, 125, 129, 131, 132, 136, 137, 138, 139, 140, 144, 145, 147, 151, 153, 157, 159, 161, 163, 165, 169, 172, 173, 175, 178, 179, 182, 185, 186, 188, 195]
#s = [0, 6, 7, 10, 11, 12, 16, 18, 19]
m = [random.randint(1,40000) for r in xrange(20000)]
s = list(set(m))
s.sort()
lenS = len(s)
halfRange = (s[lenS-1] - s[0]) // 2
while s[lenS-1] - s[lenS-2] > halfRange:
s.pop()
lenS -= 1
halfRange = (s[lenS-1] - s[0]) // 2
while s[1] - s[0] > halfRange:
s.pop(0)
lenS -=1
halfRange = (s[lenS-1] - s[0]) // 2
n = lenS
largest = (s[n-1] - s[0]) // 2
#largest = 1000 #set the maximum size of d searched
maxS = s[n-1]
maxD = 0
maxSeq = 0
hCount = [None]*(largest + 1)
hLast = [None]*(largest + 1)
best = {}
start = timeit.default_timer()
for i in range(1,n):
sys.stdout.write(repr(i)+"\r")
for j in range(i-1,-1,-1):
d = s[i] - s[j]
numLeft = n - i
if d != 0:
maxPossible = (maxS - s[i]) // d + 2
else:
maxPossible = numLeft + 2
ok = numLeft + 2 > maxSeq and maxPossible > maxSeq
if d > largest or (d > maxD and not ok):
break
if hLast[d] != None:
found = False
for k in range (len(hLast[d])-1,-1,-1):
tmpLast = hLast[d][k]
if tmpLast == j:
found = True
hLast[d][k] = i
hCount[d][k] += 1
tmpCount = hCount[d][k]
if tmpCount > maxSeq:
maxSeq = tmpCount
best = {'len': tmpCount, 'd': d, 'last': i}
elif s[tmpLast] < s[j]:
del hLast[d][k]
del hCount[d][k]
if not found and ok:
hLast[d].append(i)
hCount[d].append(2)
elif ok:
if d > maxD:
maxD = d
hLast[d] = [i]
hCount[d] = [2]
end = timeit.default_timer()
seconds = (end - start)
#print (hCount)
#print (hLast)
print(best)
print(seconds)