I\'ve got a ndarray
of floating point values in numpy and I want to find the unique values of this array. Of course, this has problems because of floating point
Doesn't floor
and round
both fail the OP's requirement in some cases?
np.floor([5.99999999, 6.0]) # array([ 5., 6.])
np.round([6.50000001, 6.5], 0) #array([ 7., 6.])
The way I would do it is (and this may not be optimal (and is surely slower than other answers)) something like this:
import numpy as np
TOL = 1.0e-3
a = np.random.random((10,10))
i = np.argsort(a.flat)
d = np.append(True, np.diff(a.flat[i]))
result = a.flat[i[d>TOL]]
Of course this method will exclude all but the largest member of a run of values that come within the tolerance of any other value, which means you may not find any unique values in an array if all values are significantly close even though the max-min is larger than the tolerance.
Here is essentially the same algorithm, but easier to understand and should be faster as it avoids an indexing step:
a = np.random.random((10,))
b = a.copy()
b.sort()
d = np.append(True, np.diff(b))
result = b[d>TOL]
The OP may also want to look into scipy.cluster
(for a fancy version of this method) or numpy.digitize
(for a fancy version of the other two methods)