Find indexes of repeated elements in an array (Python, NumPy)

前端 未结 5 1227
甜味超标
甜味超标 2020-12-20 14:52

Assume, I have a NumPy-array of integers, as:

[34,2,3,22,22,22,22,22,22,18,90,5,-55,-19,22,6,6,6,6,6,6,6,6,23,53,1,5,-42,82]

I want to find

相关标签:
5条回答
  • 2020-12-20 15:04

    Using np.diff and the method given here by @WarrenWeckesser for finding runs of zeros in an array:

    import numpy as np
    
    def zero_runs(a):  # from link
        iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
        absdiff = np.abs(np.diff(iszero))
        ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
        return ranges
    
    a = [34,2,3,22,22,22,22,22,22,18,90,5,-55,-19,22,6,6,6,6,6,6,6,6,23,53,1,5,-42,82]
    
    zero_runs(np.diff(a))
    Out[87]: 
    array([[ 3,  8],
           [15, 22]], dtype=int32)
    

    This can then be filtered on the difference between the start & end of the run:

    runs = zero_runs(np.diff(a))
    
    runs[runs[:, 1]-runs[:, 0]>5]  # runs of 7 or more, to illustrate filter
    Out[96]: array([[15, 22]], dtype=int32)
    
    0 讨论(0)
  • 2020-12-20 15:09

    Here is a relatively quick, errorless solution which also tells you how many copies were in the run. Some of this code was borrowed from KAL's solution.

    # Return the start and (1-past-the-end) indices of the first instance of
    # at least min_count copies of element value in container l 
    def find_repeat(value, min_count, l):
      look_for = [value for _ in range(min_count)]
      for i in range(len(l)):
        count = 0
        while l[i + count] == value:
          count += 1
        if count >= min_count:
          return i, i + count
    
    0 讨论(0)
  • 2020-12-20 15:18

    If you're looking for value repeated n times in list L, you could do something like this:

    def find_repeat(value, n, L):
        look_for = [value for _ in range(n)]
        for i in range(len(L)):
            if L[i] == value and L[i:i+n] == look_for:
                return i, i+n
    
    0 讨论(0)
  • 2020-12-20 15:23

    There really isn't a great short-cut for this. You can do something like:

    mult = 5
    for elem in val_list:
        target = [elem] * mult
        found_at = val_list.index(target)
    

    I leave the not-found exceptions and longer sequence detection to you.

    0 讨论(0)
  • 2020-12-20 15:25

    Here is a solution using Python's native itertools.

    Code

    import itertools as it
    
    
    def find_ranges(lst, n=2):
        """Return ranges for `n` or more repeated values."""
        groups = ((k, tuple(g)) for k, g in it.groupby(enumerate(lst), lambda x: x[-1]))
        repeated = (idx_g for k, idx_g in groups if len(idx_g) >=n)
        return ((sub[0][0], sub[-1][0]) for sub in repeated)
    
    lst = [34,2,3,22,22,22,22,22,22,18,90,5,-55,-19,22,6,6,6,6,6,6,6,6,23,53,1,5,-42,82]    
    list(find_ranges(lst, 5))
    # [(3, 8), (15, 22)]
    

    Tests

    import nose.tools as nt
    
    
    def test_ranges(f):
        """Verify list results identifying ranges."""
        nt.eq_(list(f([])), [])
        nt.eq_(list(f([0, 1,1,1,1,1,1, 2], 5)), [(1, 6)])
        nt.eq_(list(f([1,1,1,1,1,1, 2,2, 1, 3, 1,1,1,1,1,1], 5)), [(0, 5), (10, 15)])
        nt.eq_(list(f([1,1, 2, 1,1,1,1, 2, 1,1,1], 3)), [(3, 6), (8, 10)])    
        nt.eq_(list(f([1,1,1,1, 2, 1,1,1, 2, 1,1,1,1], 3)), [(0, 3), (5, 7), (9, 12)])
    
    test_ranges(find_ranges)
    

    This example captures (index, element) pairs in lst, and then groups them by element. Only repeated pairs are retained. Finally, first and last pairs are sliced, yielding (start, end) indices from each repeated group.

    See also this post for finding ranges of indices using itertools.groupby.

    0 讨论(0)
提交回复
热议问题