efficient loop over numpy array

前端 未结 8 1622
轮回少年
轮回少年 2021-01-06 02:05

Versions of this question have already been asked but I have not found a satisfactory answer.

Problem: given a large numpy vector, find indices of t

相关标签:
8条回答
  • 2021-01-06 02:33

    Approach #1

    You can simulate that iterator dependency criteria for a vectorized solution using a triangular matrix. This is based on this post that dealt with multiplication involving iterator dependency. For performing the elementwise equality of each element in vect against its all elements, we can use NumPy broadcasting. Finally, we can use np.count_nonzero to get the count, as it's supposed to be very efficient in summing purposes on boolean arrays.

    So, we would have a solution like so -

    mask = np.triu(vect[:,None] == vect,1)
    counter = np.count_nonzero(mask)
    dupl = np.where(mask)[1]
    

    If you only care about the count counter, we could have two more approaches as listed next.

    Approach #2

    We can avoid the use of the triangular matrix and simply get the entire count and just subtract the contribution from diagonal elements and consider just one of either lower of upper triangular regions by just halving the remaining count as the contributions from either ones would be identical.

    So, we would have a modified solution like so -

    counter = (np.count_nonzero(vect[:,None] == vect) - vect.size)//2
    

    Approach #3

    Here's an entirely different approach that uses the fact the count of each unique element plays a cumsumed contribution to the final total.

    So, with that idea in mind, we would have a third approach like so -

    count = np.bincount(vect) # OR np.unique(vect,return_counts=True)[1]
    idx = count[count>1]
    id_arr = np.ones(idx.sum(),dtype=int)
    id_arr[0] = 0
    id_arr[idx[:-1].cumsum()] = -idx[:-1]+1
    counter = np.sum(id_arr.cumsum())
    
    0 讨论(0)
  • 2021-01-06 02:41

    I wonder why whatever I tried Python is 100x or more slower than an equivalent C code.

    Because Python programs are usually 100x slower than C programs.

    You can either implement critical code paths in C and provide Python-C bindings, or change the algorithm. You can write an O(N) version by using a dict that reverses the array from value to index.

    import numpy as np
    N = 10000
    vect = np.arange(float(N))
    vect[N/2] = 1
    vect[N/4] = 1
    dupl = {}
    print("init done")
    counter = 0
    for i in range(N):
        e = dupl.get(vect[i], None)
        if e is None:
            dupl[vect[i]] = [i]
        else:
            e.append(i)
            counter += 1
    
    print("counter =", counter)
    print([(k, v) for k, v in dupl.items() if len(v) > 1])
    

    Edit:

    If you need to test against an eps with abs(vect[i] - vect[j]) < eps you can then normalize the values up to eps

    abs(vect[i] - vect[j]) < eps ->
    abs(vect[i] - vect[j]) / eps < (eps / eps) ->
    abs(vect[i]/eps - vect[j]/eps) < 1
    int(abs(vect[i]/eps - vect[j]/eps)) = 0
    

    Like this:

    import numpy as np
    N = 10000
    vect = np.arange(float(N))
    vect[N/2] = 1
    vect[N/4] = 1
    dupl = {}
    print("init done")
    counter = 0
    eps = 0.01
    for i in range(N):
        k = int(vect[i] / eps)
        e = dupl.get(k, None)
        if e is None:
            dupl[k] = [i]
        else:
            e.append(i)
            counter += 1
    
    print("counter =", counter)
    print([(k, v) for k, v in dupl.items() if len(v) > 1])
    
    0 讨论(0)
提交回复
热议问题