efficient loop over numpy array

前端 未结 8 1645
轮回少年
轮回少年 2021-01-06 02:05

Versions of this question have already been asked but I have not found a satisfactory answer.

Problem: given a large numpy vector, find indices of t

8条回答
  •  青春惊慌失措
    2021-01-06 02:15

    Python itself is a highly-dynamic, slow, language. The idea in numpy is to use vectorization, and avoid explicit loops. In this case, you can use np.equal.outer. You can start with

    a = np.equal.outer(vect, vect)
    

    Now, for example, to find the sum:

     >>> np.sum(a)
     10006
    

    To find the indices of i that are equal, you can do

    np.fill_diagonal(a, 0)
    
    >>> np.nonzero(np.any(a, axis=0))[0]
    array([   1, 2500, 5000])
    

    Timing

    def find_vec():
        a = np.equal.outer(vect, vect)
        s = np.sum(a)
        np.fill_diagonal(a, 0)
        return np.sum(a), np.nonzero(np.any(a, axis=0))[0]
    
    >>> %timeit find_vec()
    1 loops, best of 3: 214 ms per loop
    
    def find_loop():
        dupl = []
        counter = 0
        for i in range(N):
            for j in range(i+1, N):
                 if vect[i] == vect[j]:
                     dupl.append(j)
                     counter += 1
        return dupl
    
    >>> % timeit find_loop()
    1 loops, best of 3: 8.51 s per loop
    

提交回复
热议问题