Identify groups of continuous numbers in a list

问题

I\'d like to identify groups of continuous numbers in a list, so that:

myfunc([2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20])

Returns:

[(2,5), (12,17), 20]

And was wondering what the best way to do this was (particularly if there\'s something inbuilt into Python).

Edit: Note I originally forgot to mention that individual numbers should be returned as individual numbers, not ranges.

回答1:

more_itertools.consecutive_groups was added in version 4.0.

Demo

import more_itertools as mit


iterable = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
[list(group) for group in mit.consecutive_groups(iterable)]
# [[2, 3, 4, 5], [12, 13, 14, 15, 16, 17], [20]]

Code

Applying this tool, we make a generator function that finds ranges of consecutive numbers.

def find_ranges(iterable):
    """Yield range of consecutive numbers."""
    for group in mit.consecutive_groups(iterable):
        group = list(group)
        if len(group) == 1:
            yield group[0]
        else:
            yield group[0], group[-1]


iterable = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
list(find_ranges(iterable))
# [(2, 5), (12, 17), 20]

The source implementation emulates a classic recipe (as demonstrated by @Nadia Alramli).

Note: more_itertools is a third-party package installable via pip install more_itertools.

回答2:

EDIT 2: To answer the OP new requirement

ranges = []
for key, group in groupby(enumerate(data), lambda (index, item): index - item):
    group = map(itemgetter(1), group)
    if len(group) > 1:
        ranges.append(xrange(group[0], group[-1]))
    else:
        ranges.append(group[0])

Output:

[xrange(2, 5), xrange(12, 17), 20]

You can replace xrange with range or any other custom class.

Python docs have a very neat recipe for this:

from operator import itemgetter
from itertools import groupby
data = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17]
for k, g in groupby(enumerate(data), lambda (i,x):i-x):
    print map(itemgetter(1), g)

Output:

[2, 3, 4, 5]
[12, 13, 14, 15, 16, 17]

If you want to get the exact same output, you can do this:

ranges = []
for k, g in groupby(enumerate(data), lambda (i,x):i-x):
    group = map(itemgetter(1), g)
    ranges.append((group[0], group[-1]))

output:

[(2, 5), (12, 17)]

EDIT: The example is already explained in the documentation but maybe I should explain it more:

The key to the solution is differencing with a range so that consecutive numbers all appear in same group.

If the data was: [2, 3, 4, 5, 12, 13, 14, 15, 16, 17] Then groupby(enumerate(data), lambda (i,x):i-x) is equivalent of the following:

groupby(
    [(0, 2), (1, 3), (2, 4), (3, 5), (4, 12),
    (5, 13), (6, 14), (7, 15), (8, 16), (9, 17)],
    lambda (i,x):i-x
)

The lambda function subtracts the element index from the element value. So when you apply the lambda on each item. You'll get the following keys for groupby:

[-2, -2, -2, -2, -8, -8, -8, -8, -8, -8]

groupby groups elements by equal key value, so the first 4 elements will be grouped together and so forth.

I hope this makes it more readable.

python 3 version may be helpful for beginners

import the libraries required first

from itertools import groupby
from operator import itemgetter

ranges =[]

for k,g in groupby(enumerate(data),lambda x:x[0]-x[1]):
    group = (map(itemgetter(1),g))
    group = list(map(int,group))
    ranges.append((group[0],group[-1]))

回答3:

The "naive" solution which I find somewhat readable atleast.

x = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 22, 25, 26, 28, 51, 52, 57]

def group(L):
    first = last = L[0]
    for n in L[1:]:
        if n - 1 == last: # Part of the group, bump the end
            last = n
        else: # Not part of the group, yield current group and start a new
            yield first, last
            first = last = n
    yield first, last # Yield the last group


>>>print list(group(x))
[(2, 5), (12, 17), (22, 22), (25, 26), (28, 28), (51, 52), (57, 57)]

回答4:

Assuming your list is sorted:

>>> from itertools import groupby
>>> def ranges(lst):
    pos = (j - i for i, j in enumerate(lst))
    t = 0
    for i, els in groupby(pos):
        l = len(list(els))
        el = lst[t]
        t += l
        yield range(el, el+l)


>>> lst = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17]
>>> list(ranges(lst))
[range(2, 6), range(12, 18)]

回答5:

Here it is something that should work, without any import needed:

def myfunc(lst):
    ret = []
    a = b = lst[0]                           # a and b are range's bounds

    for el in lst[1:]:
        if el == b+1: 
            b = el                           # range grows
        else:                                # range ended
            ret.append(a if a==b else (a,b)) # is a single or a range?
            a = b = el                       # let's start again with a single
    ret.append(a if a==b else (a,b))         # corner case for last single/range
    return ret

回答6:

Please note that the code using groupby doesn't work as given in Python 3 so use this.

for k, g in groupby(enumerate(data), lambda x:x[0]-x[1]):
    group = list(map(itemgetter(1), g))
    ranges.append((group[0], group[-1]))

回答7:

This doesn't use a standard function - it just iiterates over the input, but it should work:

def myfunc(l):
    r = []
    p = q = None
    for x in l + [-1]:
        if x - 1 == q:
            q += 1
        else:
            if p:
               if q > p:
                   r.append('%s-%s' % (p, q))
               else:
                   r.append(str(p))
            p = q = x
    return '(%s)' % ', '.join(r)

Note that it requires that the input contains only positive numbers in ascending order. You should validate the input, but this code is omitted for clarity.

回答8:

Here's the answer I came up with. I'm writing the code for other people to understand, so I'm fairly verbose with variable names and comments.

First a quick helper function:

def getpreviousitem(mylist,myitem):
    '''Given a list and an item, return previous item in list'''
    for position, item in enumerate(mylist):
        if item == myitem:
            # First item has no previous item
            if position == 0:
                return None
            # Return previous item    
            return mylist[position-1]

And then the actual code:

def getranges(cpulist):
    '''Given a sorted list of numbers, return a list of ranges'''
    rangelist = []
    inrange = False
    for item in cpulist:
        previousitem = getpreviousitem(cpulist,item)
        if previousitem == item - 1:
            # We're in a range
            if inrange == True:
                # It's an existing range - change the end to the current item
                newrange[1] = item
            else:    
                # We've found a new range.
                newrange = [item-1,item]
            # Update to show we are now in a range    
            inrange = True    
        else:   
            # We were in a range but now it just ended
            if inrange == True:
                # Save the old range
                rangelist.append(newrange)
            # Update to show we're no longer in a range    
            inrange = False 
    # Add the final range found to our list
    if inrange == True:
        rangelist.append(newrange)
    return rangelist

Example run:

getranges([2, 3, 4, 5, 12, 13, 14, 15, 16, 17])

returns:

[[2, 5], [12, 17]]

回答9:

import numpy as np

myarray = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
sequences = np.split(myarray, np.array(np.where(np.diff(myarray) > 1)[0]) + 1)
l = []
for s in sequences:
    if len(s) > 1:
        l.append((np.min(s), np.max(s)))
    else:
        l.append(s[0])
print(l)

Output:

[(2, 5), (12, 17), 20]

回答10:

Using groupby and count from itertools gives us a short solution. The idea is that, in an increasing sequence, the difference between the index and the value will remain the same.

In order to keep track of the index, we can use an itertools.count, which makes the code cleaner as using enumerate:

from itertools import groupby, count

def intervals(data):
    out = []
    counter = count()

    for key, group in groupby(data, key = lambda x: x-next(counter)):
        block = list(group)
        out.append([block[0], block[-1]])
    return out

Some sample output:

print(intervals([0, 1, 3, 4, 6]))
# [[0, 1], [3, 4], [6, 6]]

print(intervals([2, 3, 4, 5]))
# [[2, 5]]

回答11:

Using numpy + comprehension lists:
With numpy diff function, consequent input vector entries that their difference is not equal to one can be identified. The start and end of the input vector need to be considered.

import numpy as np
data = np.array([2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20])

d = [i for i, df in enumerate(np.diff(data)) if df!= 1] 
d = np.hstack([-1, d, len(data)-1])  # add first and last elements 
d = np.vstack([d[:-1]+1, d[1:]]).T

print(data[d])

Output:

 [[ 2  5]   
  [12 17]   
  [20 20]]

Note: The request that individual numbers should be treated differently, (returned as individual, not ranges) was omitted. This can be reached by further post-processing the results. Usually this will make things more complex without gaining any benefit.

回答12:

A short solution that works without additional imports. It accepts any iterable, sorts unsorted inputs, and removes duplicate items:

def ranges(nums):
    nums = sorted(set(nums))
    gaps = [[s, e] for s, e in zip(nums, nums[1:]) if s+1 < e]
    edges = iter(nums[:1] + sum(gaps, []) + nums[-1:])
    return list(zip(edges, edges))

Example:

>>> ranges([2, 3, 4, 7, 8, 9, 15])
[(2, 4), (7, 9), (15, 15)]

>>> ranges([-1, 0, 1, 2, 3, 12, 13, 15, 100])
[(-1, 3), (12, 13), (15, 15), (100, 100)]

>>> ranges(range(100))
[(0, 99)]

>>> ranges([0])
[(0, 0)]

>>> ranges([])
[]

This is the same as @dansalmo's solution which I found amazing, albeit a bit hard to read and apply (as it's not given as a function).

Note that it could easily be modified to spit out "traditional" open ranges [start, end), by e.g. altering the return statement:

    return [(s, e+1) for s, e in zip(edges, edges)]

I copied this answer over from another question that was marked as a duplicate of this one with the intention to make it easier findable (after I just now searched again for this topic, finding only the question here at first and not being satisfied with the answers given).

来源：https://stackoverflow.com/questions/2154249/identify-groups-of-continuous-numbers-in-a-list

标签

python

list

range

continuous