efficient loop over numpy array

前端 未结 8 1621
轮回少年
轮回少年 2021-01-06 02:05

Versions of this question have already been asked but I have not found a satisfactory answer.

Problem: given a large numpy vector, find indices of t

相关标签:
8条回答
  • 2021-01-06 02:15

    Python itself is a highly-dynamic, slow, language. The idea in numpy is to use vectorization, and avoid explicit loops. In this case, you can use np.equal.outer. You can start with

    a = np.equal.outer(vect, vect)
    

    Now, for example, to find the sum:

     >>> np.sum(a)
     10006
    

    To find the indices of i that are equal, you can do

    np.fill_diagonal(a, 0)
    
    >>> np.nonzero(np.any(a, axis=0))[0]
    array([   1, 2500, 5000])
    

    Timing

    def find_vec():
        a = np.equal.outer(vect, vect)
        s = np.sum(a)
        np.fill_diagonal(a, 0)
        return np.sum(a), np.nonzero(np.any(a, axis=0))[0]
    
    >>> %timeit find_vec()
    1 loops, best of 3: 214 ms per loop
    
    def find_loop():
        dupl = []
        counter = 0
        for i in range(N):
            for j in range(i+1, N):
                 if vect[i] == vect[j]:
                     dupl.append(j)
                     counter += 1
        return dupl
    
    >>> % timeit find_loop()
    1 loops, best of 3: 8.51 s per loop
    
    0 讨论(0)
  • 2021-01-06 02:16

    This runs in 8 ms compared to 18 s for your code and doesn't use any strange libraries. It's similar to the approach by @vs0, but I like defaultdict more. It should be approximately O(N).

    from collections import defaultdict
    dupl = []
    counter = 0
    indexes = defaultdict(list)
    for i, e in enumerate(vect):
        indexes[e].append(i)
        if len(indexes[e]) > 1:
            dupl.append(i)
            counter += 1
    
    0 讨论(0)
  • 2021-01-06 02:21

    The obvious question is why you want to do this in this way. NumPy arrays are intended to be opaque data structures – by this I mean NumPy arrays are intended to be created inside the NumPy system and then operations sent in to the NumPy subsystem to deliver a result. i.e. NumPy should be a black box into which you throw requests and out come results.

    So given the code above I am not at all suprised that NumPy performance is worse than dreadful.

    The following should be effectively what you want, I believe, but done the NumPy way:

    import numpy as np
    
    N = 10000
    vect = np.arange(float(N))
    vect[N/2] = 1
    vect[N/4] = 1
    
    print([np.where(a == vect)[0] for a in vect][1])
    
    # Delivers [1, 2500, 5000]
    
    0 讨论(0)
  • 2021-01-06 02:22

    As an alternative to Ami Tavory's answer, you can use a Counter from the collections package to detect duplicates. On my computer it seems to be even faster. See the function below which can also find different duplicates.

    import collections
    import numpy as np
    
    def find_duplicates_original(x):
        d = []
        for i in range(len(x)):
            for j in range(i + 1, len(x)):
                if x[i] == x[j]:
                    d.append(j)
        return d
    
    def find_duplicates_outer(x):
        a = np.equal.outer(x, x)
        np.fill_diagonal(a, 0)
        return np.flatnonzero(np.any(a, axis=0))
    
    def find_duplicates_counter(x):
        counter = collections.Counter(x)
        values = (v for v, c in counter.items() if c > 1)
        return {v: np.flatnonzero(x == v) for v in values}
    
    
    n = 10000
    x = np.arange(float(n))
    x[n // 2] = 1
    x[n // 4] = 1
    
    >>>> find_duplicates_counter(x)
    {1.0: array([   1, 2500, 5000], dtype=int64)}
    
    >>>> %timeit find_duplicates_original(x)
    1 loop, best of 3: 12 s per loop
    
    >>>> %timeit find_duplicates_outer(x)
    10 loops, best of 3: 84.3 ms per loop
    
    >>>> %timeit find_duplicates_counter(x)
    1000 loops, best of 3: 1.63 ms per loop
    
    0 讨论(0)
  • 2021-01-06 02:23

    Since the answers have stopped coming and none was totally satisfactory, for the record I post my own solution.

    It is my understanding that it's the assignment which makes Python slow in this case, not the nested loops as I thought initially. Using a library or compiled code eliminates the need for assignments and performance improves dramatically.

    from __future__ import print_function
    import numpy as np
    from numba import jit
    
    N = 10000
    vect = np.arange(N, dtype=np.float32)
    
    vect[N/2] = 1
    vect[N/4] = 1
    dupl = np.zeros(N, dtype=np.int32)
    
    print("init done")
    # uncomment to enable compiled function
    #@jit
    def duplicates(i, counter, dupl, vect):
        eps = 0.01
        ns = len(vect)
        for j in range(i+1, ns):
            # replace if to use approx comparison
            #if abs(vect[i] - vect[j]) < eps:
            if vect[i] == vect[j]:
                dupl[counter] = j
                counter += 1
        return counter
    
    counter = 0
    for i in xrange(N):
        counter = duplicates(i, counter, dupl, vect)
    
    print("counter =", counter)
    print(dupl[0:counter])
    

    Tests

    # no jit
    $ time python array-test-numba.py
    init done
    counter = 3
    [2500 5000 5000]
    
    elapsed 10.135 s
    
    # with jit
    $ time python array-test-numba.py
    init done
    counter = 3
    [2500 5000 5000]
    
    elapsed 0.480 s
    

    The performance of compiled version (with @jit uncommented) is close to C code performance ~0.1 - 0.2 sec. Perhaps eliminating the last loop could improve the performance even further. The difference in performance is even stronger when using approximate comparison using eps while there is very little difference for the compiled version.

    # no jit
    $ time python array-test-numba.py
    init done
    counter = 3
    [2500 5000 5000]
    
    elapsed 109.218 s
    
    # with jit
    $ time python array-test-numba.py
    init done
    counter = 3
    [2500 5000 5000]
    
    elapsed 0.506 s
    

    This is ~ 200x difference. In the real code, I had to put both loops in the function as well as use a function template with variable types so it was a bit more complex but not very much.

    0 讨论(0)
  • 2021-01-06 02:29

    This solution using the numpy_indexed package has complexity n Log n, and is fully vectorized; so not terribly different from C performance, in all likelihood.

    import numpy_indexed as npi
    dpl = np.flatnonzero(npi.multiplicity(vect) > 1)
    
    0 讨论(0)
提交回复
热议问题