I have a point set which I have stored its coordinates in three different arrays (xa, ya, za). Now, I want to calculate the euclidean distance between each point of this point set (xa[0], ya[0], za[0] and so on) with all the points of an another point set (xb, yb, zb) and every time store the minimum distance in a new array.
Let's say that xa.shape = (11,), ya.shape = (11,), za.shape= (11,). Respectively, xb.shape = (13,), yb.shape = (13,), zb.shape = (13,). What I want to do is to take each time one xa[],ya[],za[], and calculate its distance with all the elements of xb, yb, zb, and at the end store the minimum value into an xfinal.shape = (11,) array.
Do you think that this would be possible with numpy?
A different solution would be to use the spatial module from scipy, the KDTree in particular.
This class learn from a set of data and can be interrogated given a new dataset:
from scipy.spatial import KDTree # create some fake data x = arange(20) y = rand(20) z = x**2 # put them togheter, should have a form [n_points, n_dimension] data = np.vstack([x, y, z]).T # create the KDTree kd = KDTree(data)
now if you have a point you can ask the distance and the index of the closet point (or the N closest points) simply by doing:
kd.query([1, 2, 3]) # (1.8650720813822905, 2) # your may differs
or, given an array of positions:
#bogus position x2 = rand(20)*20 y2 = rand(20)*20 z2 = rand(20)*20 # join them togheter as the input data2 = np.vstack([x2, y2, z2]).T #query them kd.query(data2) #(array([ 14.96118553, 9.15924813, 16.08269197, 21.50037074, # 18.14665096, 13.81840533, 17.464429 , 13.29368755, # 20.22427196, 9.95286671, 5.326888 , 17.00112683, # 3.66931946, 20.370496 , 13.4808055 , 11.92078034, # 5.58668204, 20.20004206, 5.41354322, 4.25145521]), #array([4, 3, 2, 4, 2, 2, 4, 2, 3, 3, 2, 3, 4, 4, 3, 3, 3, 4, 4, 4]))
You can calculate the difference from each xa to each xb with np.subtract.outer(xa, xb)
. The distance to the nearest xb is given by
np.min(np.abs(np.subtract.outer(xa, xb)), axis=1)
To extend this to 3D,
distances = np.sqrt(np.subtract.outer(xa, xb)**2 + \ np.subtract.outer(ya, yb)**2 + np.subtract.outer(za, zb)**2) distance_to_nearest = np.min(distances, axis=1)
If you actually want to know which of the b points is the nearest, you use argmin
in place of min
.
index_of_nearest = np.argmin(distances, axis=1)
There is more than one way of doing this. Most importantly, there's a trade-off between memory-usage and speed. Here's the wasteful method:
s = (1, -1) d = min((xa.reshape(s)-xb.reshape(s).T)**2 + (ya.reshape(s)-yb.reshape(s).T)**2 + (za.reshape(s)-zb.reshape(s).T)**2), axis=0)
The other method would be to iterate over the point set in b
to avoid the expansion to the full blown matrix.