Versions of this question have already been asked but I have not found a satisfactory answer.
Problem: given a large numpy vector, find indices of t
Approach #1
You can simulate that iterator dependency criteria for a vectorized solution using a triangular matrix
. This is based on this post that dealt with multiplication involving iterator dependency
. For performing the elementwise equality of each element in vect
against its all elements, we can use NumPy broadcasting
. Finally, we can use np.count_nonzero
to get the count, as it's supposed to be very efficient in summing purposes on boolean arrays.
So, we would have a solution like so -
mask = np.triu(vect[:,None] == vect,1)
counter = np.count_nonzero(mask)
dupl = np.where(mask)[1]
If you only care about the count counter
, we could have two more approaches as listed next.
Approach #2
We can avoid the use of the triangular matrix and simply get the entire count and just subtract the contribution from diagonal elements and consider just one of either lower of upper triangular regions by just halving the remaining count as the contributions from either ones would be identical.
So, we would have a modified solution like so -
counter = (np.count_nonzero(vect[:,None] == vect) - vect.size)//2
Approach #3
Here's an entirely different approach that uses the fact the count of each unique element plays a cumsumed contribution to the final total.
So, with that idea in mind, we would have a third approach like so -
count = np.bincount(vect) # OR np.unique(vect,return_counts=True)[1]
idx = count[count>1]
id_arr = np.ones(idx.sum(),dtype=int)
id_arr[0] = 0
id_arr[idx[:-1].cumsum()] = -idx[:-1]+1
counter = np.sum(id_arr.cumsum())
I wonder why whatever I tried Python is 100x or more slower than an equivalent C code.
Because Python programs are usually 100x slower than C programs.
You can either implement critical code paths in C and provide Python-C bindings, or change the algorithm. You can write an O(N) version by using a dict
that reverses the array from value to index.
import numpy as np
N = 10000
vect = np.arange(float(N))
vect[N/2] = 1
vect[N/4] = 1
dupl = {}
print("init done")
counter = 0
for i in range(N):
e = dupl.get(vect[i], None)
if e is None:
dupl[vect[i]] = [i]
else:
e.append(i)
counter += 1
print("counter =", counter)
print([(k, v) for k, v in dupl.items() if len(v) > 1])
Edit:
If you need to test against an eps with abs(vect[i] - vect[j]) < eps you can then normalize the values up to eps
abs(vect[i] - vect[j]) < eps ->
abs(vect[i] - vect[j]) / eps < (eps / eps) ->
abs(vect[i]/eps - vect[j]/eps) < 1
int(abs(vect[i]/eps - vect[j]/eps)) = 0
Like this:
import numpy as np
N = 10000
vect = np.arange(float(N))
vect[N/2] = 1
vect[N/4] = 1
dupl = {}
print("init done")
counter = 0
eps = 0.01
for i in range(N):
k = int(vect[i] / eps)
e = dupl.get(k, None)
if e is None:
dupl[k] = [i]
else:
e.append(i)
counter += 1
print("counter =", counter)
print([(k, v) for k, v in dupl.items() if len(v) > 1])