efficient loop over numpy array

前端 未结 8 1637
轮回少年
轮回少年 2021-01-06 02:05

Versions of this question have already been asked but I have not found a satisfactory answer.

Problem: given a large numpy vector, find indices of t

8条回答
  •  遥遥无期
    2021-01-06 02:22

    As an alternative to Ami Tavory's answer, you can use a Counter from the collections package to detect duplicates. On my computer it seems to be even faster. See the function below which can also find different duplicates.

    import collections
    import numpy as np
    
    def find_duplicates_original(x):
        d = []
        for i in range(len(x)):
            for j in range(i + 1, len(x)):
                if x[i] == x[j]:
                    d.append(j)
        return d
    
    def find_duplicates_outer(x):
        a = np.equal.outer(x, x)
        np.fill_diagonal(a, 0)
        return np.flatnonzero(np.any(a, axis=0))
    
    def find_duplicates_counter(x):
        counter = collections.Counter(x)
        values = (v for v, c in counter.items() if c > 1)
        return {v: np.flatnonzero(x == v) for v in values}
    
    
    n = 10000
    x = np.arange(float(n))
    x[n // 2] = 1
    x[n // 4] = 1
    
    >>>> find_duplicates_counter(x)
    {1.0: array([   1, 2500, 5000], dtype=int64)}
    
    >>>> %timeit find_duplicates_original(x)
    1 loop, best of 3: 12 s per loop
    
    >>>> %timeit find_duplicates_outer(x)
    10 loops, best of 3: 84.3 ms per loop
    
    >>>> %timeit find_duplicates_counter(x)
    1000 loops, best of 3: 1.63 ms per loop
    

提交回复
热议问题