I have a million integers in sorted order and I would like to find the longest subsequence where the difference between consecutive pairs is equal. For example
This is my 2 cents.
If you have a list called input:
input = [1, 4, 5, 7, 8, 12]
You can build a data structure that for each one of this points (excluding the first one), will tell you how far is that point from anyone of its predecessors:
[1, 4, 5, 7, 8, 12]
x 3 4 6 7 11 # distance from point i to point 0
x x 1 3 4 8 # distance from point i to point 1
x x x 2 3 7 # distance from point i to point 2
x x x x 1 5 # distance from point i to point 3
x x x x x 4 # distance from point i to point 4
Now that you have the columns, you can consider the i-th item of input (which is input[i]) and each number n in its column.
The numbers that belong to a series of equidistant numbers that include input[i], are those which have n * j in the i-th position of their column, where j is the number of matches already found when moving columns from left to right, plus the k-th predecessor of input[i], where k is the index of n in the column of input[i].
Example: if we consider i = 1, input[i] = 4, n = 3, then, we can identify a sequence comprehending 4 (input[i]), 7 (because it has a 3 in position 1 of its column) and 1, because k is 0, so we take the first predecessor of i.
Possible implementation (sorry if the code is not using the same notation as the explanation):
def build_columns(l):
columns = {}
for x in l[1:]:
col = []
for y in l[:l.index(x)]:
col.append(x - y)
columns[x] = col
return columns
def algo(input, columns):
seqs = []
for index1, number in enumerate(input[1:]):
index1 += 1 #first item was sliced
for index2, distance in enumerate(columns[number]):
seq = []
seq.append(input[index2]) # k-th pred
seq.append(number)
matches = 1
for successor in input[index1 + 1 :]:
column = columns[successor]
if column[index1] == distance * matches:
matches += 1
seq.append(successor)
if (len(seq) > 2):
seqs.append(seq)
return seqs
The longest one:
print max(sequences, key=len)