Identify groups of varying continuous numbers in a list

前端 未结 4 836
囚心锁ツ
囚心锁ツ 2020-12-17 20:56

In this other SO post, a Python user asked how to group continuous numbers such that any sequences could just be represented by its start/end and any stragglers would be dis

相关标签:
4条回答
  • 2020-12-17 21:43

    Here is a quickly written (and extremely ugly) answer:

    def test(inArr):
        arr=inArr[:] #copy, unnecessary if we use index in a smart way
        result = []
        while len(arr)>1: #as long as there can be an arithmetic progression
            x=[arr[0],arr[1]] #take first two
            arr=arr[2:] #remove from array
            step=x[1]-x[0]
            while len(arr)>0 and x[1]+step==arr[0]: #check if the next value in array is part of progression too
                x[1]+=step #add it
                arr=arr[1:]
            result.append((x[0],x[1],step)) #append progression to result
        if len(arr)==1:
            result.append(arr[0])
        return result
    
    print test([2, 4, 6, 8, 12, 13, 14, 15, 16, 17, 20])
    

    This returns [(2, 8, 2), (12, 17, 1), 20]

    Slow, as it copies a list and removes elements from it

    It only finds complete progressions, and only in sorted arrays.

    In short, it is shitty, but should work ;)

    There are other (cooler, more pythonic) ways to do this, for example you could convert your list to a set, keep removing two elements, calculate their arithmetic progression and intersect with the set.

    You could also reuse the answer you provided to check for certain step sizes. e.g.:

    ranges = []
    step_size=2
    for key, group in groupby(enumerate(data), lambda (index, item): step_size*index - item):
        group = map(itemgetter(1), group)
        if len(group) > 1:
            ranges.append(xrange(group[0], group[-1]))
        else:
            ranges.append(group[0])
    

    Which finds every group with step size of 2, but only those.

    0 讨论(0)
  • 2020-12-17 21:57

    You can create an iterator to help grouping and try to pull the next element from the following group which will be the end of the previous group:

    def ranges(lst):
        it = iter(lst)
        next(it)  # move to second element for comparison
        grps = groupby(lst, key=lambda x: (x - next(it, -float("inf"))))
        for k, v in grps:
            i = next(v)
            try:
                step = next(v) - i  # catches single element v or gives us a step
                nxt = list(next(grps)[1])
                yield xrange(i, nxt.pop(0), step)
                # outliers or another group
                if nxt:
                    yield nxt[0] if len(nxt) == 1 else xrange(nxt[0], next(next(grps)[1]), nxt[1] - nxt[0])
            except StopIteration:
                yield i  # no seq
    

    which give you:

    In [2]: l1 = [2, 3, 4, 5, 8, 10, 12, 14, 13, 14, 15, 16, 17, 20, 21]
    
    In [3]: l2 = [2, 4, 6, 8, 12, 13, 14, 15, 16, 17, 20]
    
    In [4]: l3 = [13, 14, 15, 16, 17, 18]
    
    In [5]: s1 = [i + 10 for i in xrange(0, 11, 2)]
    
    In [6]: s2 = [30]
    
    In [7]: s3 = [i + 40 for i in xrange(45)]
    
    In [8]: l4 = s1 + s2 + s3
    
    In [9]: l5 = [1, 2, 5, 6, 9, 10]
    
    In [10]: l6 = {1, 2, 3, 5, 6, 9, 10, 13, 19, 21, 22, 23, 24}
    
    In [11]: 
    
    In [11]: for l in (l1, l2, l3, l4, l5, l6):
       ....:         print(list(ranges(l)))
       ....:     
    [xrange(2, 5), xrange(8, 14, 2), xrange(13, 17), 20, 21]
    [xrange(2, 8, 2), xrange(12, 17), 20]
    [xrange(13, 18)]
    [xrange(10, 20, 2), 30, xrange(40, 84)]
    [1, 2, 5, 6, 9, 10]
    [xrange(1, 3), 5, 6, 9, 10, 13, 19, xrange(21, 24)]
    

    When the step is 1 it is not included in the xrange output.

    0 讨论(0)
  • 2020-12-17 22:00

    I came across such a case once. Here it goes.

    import more_itertools as mit
    iterable = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]  # input
    x = [list(group) for group in mit.consecutive_groups(iterable)]
    output = [(i[0],i[-1]) if len(i)>1 else i[0] for i in x]
    print(output)
    
    0 讨论(0)
  • 2020-12-17 22:02

    The itertools pairwise recipe is one way to solve the problem. Applied with itertools.groupby, groups of pairs whose mathematical difference are equivalent can be created. The first and last items of each group are then selected for multi-item groups or the last item is selected for singleton groups:

    from itertools import groupby, tee, izip
    
    
    def pairwise(iterable):
        "s -> (s0,s1), (s1,s2), (s2, s3), ..."
        a, b = tee(iterable)
        next(b, None)
        return izip(a, b)
    
    def grouper(lst):
        result = []
        for k, g in groupby(pairwise(lst), key=lambda x: x[1] - x[0]):
            g  = list(g)
            if len(g) > 1:
                try:
                    if g[0][0] == result[-1]:
                        del result[-1]
                    elif g[0][0] == result[-1][1]:
                        g = g[1:] # patch for duplicate start and/or end
                except (IndexError, TypeError):
                    pass
                result.append((g[0][0], g[-1][-1], k))
            else:
                result.append(g[0][-1]) if result else result.append(g[0])
        return result
    

    Trial: input -> grouper(lst) -> output

    Input: [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
    Output: [(2, 5, 1), (12, 17, 1), 20]
    
    Input: [2, 4, 6, 8, 12, 13, 14, 15, 16, 17, 20]
    Output: [(2, 8, 2), (12, 17, 1), 20]
    
    Input: [2, 4, 6, 8, 12, 12.4, 12.9, 13, 14, 15, 16, 17, 20]
    Output: [(2, 8, 2), 12, 12.4, 12.9, (13, 17, 1), 20] # 12 does not appear in the second group
    

    Update: (patch for duplicate start and/or end values)

    s1 = [i + 10 for i in xrange(0, 11, 2)]; s2 = [30]; s3 = [i + 40 for i in xrange(45)]
    
    Input: s1+s2+s3
    Output: [(10, 20, 2), (30, 40, 10), (41, 84, 1)]
    
    # to make 30 appear as an entry instead of a group change main if condition to len(g) > 2
    Input: s1+s2+s3
    Output: [(10, 20, 2), 30, (41, 84, 1)]
    
    Input: [2, 4, 6, 8, 10, 12, 13, 14, 15, 16, 17, 20]
    Output: [(2, 12, 2), (13, 17, 1), 20]
    
    0 讨论(0)
提交回复
热议问题