In python, how does one efficiently find the largest consecutive set of numbers in a list that are not necessarily adjacent?

问题

For instance, if I have a list

[1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11]

This algorithm should return [1,2,3,4,5,6,7,8,9,10,11].

To clarify, the longest list should run forwards. I was wondering what is an algorithmically efficient way to do this (preferably not O(n^2))?

Also, I'm open to a solution not in python since the algorithm is what matters.

Thank you.

回答1:

Here is a simple one-pass O(n) solution:

s = [1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11,42]
maxrun = -1
rl = {}
for x in s:
    run = rl[x] = rl.get(x-1, 0) + 1
    print x-run+1, 'to', x
    if run > maxrun:
        maxend, maxrun = x, run
print range(maxend-maxrun+1, maxend+1)

The logic may be a little more self-evident if you think in terms of ranges instead of individual variables for the endpoint and run length:

rl = {}
best_range = xrange(0)
for x in s:
    run = rl[x] = rl.get(x-1, 0) + 1
    r = xrange(x-run+1, x+1)
    if len(r) > len(best_range):
        best_range = r
print list(best_range)

回答2:

Not that clever, not O(n), could use a bit of optimization. But it works.

def longest(seq):
  result = []
  for v in seq:
    for l in result:
      if v == l[-1] + 1:
        l.append(v)
    else:
      result.append([v])
  return max(result, key=len)

回答3:

You can use The Patience Sort implementation of the Largest Ascending Sub-sequence Algorithm

def LargAscSub(seq):
    deck = []
    for x in seq:
        newDeck = [x]
        i = bisect.bisect_left(deck, newDeck)
        deck[i].insert(0, x) if i != len(deck) else deck.append(newDeck)
    return [p[0] for p in deck]

And here is the Test results

>>> LargAscSub([1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11])
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
>>> LargAscSub([1, 2, 3, 11, 12, 13, 14])
[1, 2, 3, 11, 12, 13, 14]
>>> LargAscSub([11,12,13,14])
[11, 12, 13, 14]

The Order of Complexity is O(nlogn)

There was one note in the wiki link where they claimed that you can achieve O(n.loglogn) by relying on Van Emde Boas tree

回答4:

How about using a modified Radix Sort? As JanneKarila pointed out the solution is not O(n). It uses Radix sort, which wikipedia says Radix sort's efficiency is O(k·n) for n keys which have k or fewer digits.

This will only work if you know the range of numbers that we're dealing with so that will be the first step.

Look at each element in starting list to find lowest, l and highest, h number. In this case l is 1 and h is 11. Note, if you already know the range for some reason, you can skip this step.
Create a result list the size of our range and set each element to null.
Look at each element in list and add them to the result list at the appropriate place if needed. ie, the element is a 4, add a 4 to the result list at position 4. result[element] = starting_list[element]. You can throw out duplicates if you want, they'll just be overwritten.
Go through the result list to find the longest sequence without any null values. Keep a element_counter to know what element in the result list we're looking at. Keep a curr_start_element set to the beginning element of the current sequence and keep a curr_len of how long the current sequence is. Also keep a longest_start_element and a `longest_len' which will start out as zero and be updated as we move through the list.
Return the result list starting at longest_start_element and taking longest_len

EDIT: Code added. Tested and working

#note this doesn't work with negative numbers
#it's certainly possible to write this to work with negatives
# but the code is a bit hairier
import sys
def findLongestSequence(lst):
    #step 1
    high = -sys.maxint - 1

    for num in lst:
        if num > high:
            high = num

    #step 2
    result = [None]*(high+1)

    #step 3
    for num in lst:
        result[num] = num

    #step 4
    curr_start_element = 0
    curr_len = 0
    longest_start_element = -1
    longest_len = -1

    for element_counter in range(len(result)):
        if result[element_counter] == None:

            if curr_len > longest_len:
                longest_start_element = curr_start_element
                longest_len = curr_len

            curr_len = 0
            curr_start_element = -1

        elif curr_start_element == -1:
            curr_start_element = element_counter

        curr_len += 1

    #just in case the last element makes the longest
    if curr_len > longest_len:
        longest_start_element = curr_start_element
        longest_len = curr_len


    #step 5
    return result[longest_start_element:longest_start_element + longest_len-1]

回答5:

If the result really does have to be a sub-sequence of consecutive ascending integers, rather than merely ascending integers, then there's no need to remember each entire consecutive sub-sequence until you determine which is the longest, you need only remember the starting and ending values of each sub-sequence. So you could do something like this:

def longestConsecutiveSequence(sequence):
    # map starting values to largest ending value so far
    map = collections.OrderedDict()

    for i in sequence:
        found = False
        for k, v in map.iteritems():
            if i == v:
                map[k] += 1
                found = True

        if not found and i not in map:
            map[i] = i + 1

    return xrange(*max(map.iteritems(), key=lambda i: i[1] - i[0]))

If I run this on the original sample date (i.e. [1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11]) I get:

>>> print list(longestConsecutiveSequence([1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11]))
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

If I run it on one of Abhijit's samples [1,2,3,11,12,13,14], I get:

>>> print list(longestConsecutiveSequence([1,2,3,11,12,13,14]))
[11, 12, 13, 14]

Regrettably, this algorithm is O(n*n) in the worst case.

回答6:

Warning: This is the cheaty way to do it (aka I use python...)

import operator as op
import itertools as it

def longestSequence(data):

    longest = []

    for k, g in it.groupby(enumerate(set(data)), lambda(i, y):i-y):
        thisGroup = map(op.itemgetter(1), g)

        if len(thisGroup) > len(longest):
            longest = thisGroup

    return longest


longestSequence([1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11, 15,15,16,17,25])

回答7:

You need the Maximum contiguous sum(Optimal Substructure):

def msum2(a):
    bounds, s, t, j = (0,0), -float('infinity'), 0, 0

    for i in range(len(a)):
        t = t + a[i]
        if t > s: bounds, s = (j, i+1), t
        if t < 0: t, j = 0, i+1
    return (s, bounds)

This is an example of dynamic programming and is O(N)

回答8:

O(n) solution works even if the sequence does not start from the first element.

Warning does not work if len(A) = 0.

A = [1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11]
def pre_process(A): 
    Last = {}
    Arrow = []
    Length = []
    ArgMax = 0
    Max = 0
    for i in xrange(len(A)): 
        Arrow.append(i)
        Length.append(0)  
        if A[i] - 1 in Last: 
            Aux = Last[A[i] - 1]
            Arrow[i] = Aux
            Length[i] = Length[Aux] + 1
        Last[A[i]] = i 
        if Length[i] > Max:
            ArgMax = i 
            Max = Length[i]
    return (Arrow,ArgMax)  

(Arr,Start) = pre_process(A) 
Old = Arr[Start] 
ToRev = []
while 1: 
    ToRev.append(A[Start]) 
    if Old == Start: 
        break
    Start = Old 
    New = Arr[Start]
    Old = New
ToRev.reverse()
print ToRev

Pythonizations are welcome!!

回答9:

Ok, here's yet another attempt in python:

def popper(l):
    listHolders = []
    pos = 0
    while l:
        appended = False
        item = l.pop()
        for holder in listHolders:
            if item == holder[-1][0]-1:
                appended = True
                holder.append((item, pos))
        if not appended:
            pos += 1
            listHolders.append([(item, pos)])
    longest = []
    for holder in listHolders:
        try:
            if (holder[0][0] < longest[-1][0]) and (holder[0][1] > longest[-1][1]):
                longest.extend(holder)
        except:
            pass
        if len(holder) > len(longest):
            longest = holder
    longest.reverse()
    return [x[0] for x in longest]

Sample inputs and outputs:

>>> demo = list(range(50))
>>> shuffle(demo)
>>> demo
[40, 19, 24, 5, 48, 36, 23, 43, 14, 35, 18, 21, 11, 7, 34, 16, 38, 25, 46, 27, 26, 29, 41, 8, 31, 1, 33, 2, 13, 6, 44, 22, 17,
 12, 39, 9, 49, 3, 42, 37, 30, 10, 47, 20, 4, 0, 28, 32, 45, 15]
>>> popper(demo)
[1, 2, 3, 4]
>>> demo = [1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11]
>>> popper(demo)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
>>>

回答10:

This should do the trick (and is O(n)):

target = 1
result = []
for x in list:
    for y in result:
        if y[0] == target:
            y[0] += 1
            result.append(x)

For any starting number, this works:

result = []
for x in mylist:
    matched = False
    for y in result:
        if y[0] == x:
            matched = True
            y[0] += 1
            y.append(x)
    if not matched:
        result.append([x+1, x])
return max(result, key=len)[1:]

来源：https://stackoverflow.com/questions/8664708/in-python-how-does-one-efficiently-find-the-largest-consecutive-set-of-numbers

标签

python

arrays

algorithm

numpy

dynamic-programming