Calculate nearest distance to certain points in python

微笑、不失礼 提交于 2021-02-11 14:16:19


I have a dataset as shown below, each sample has x and y values and the corresponding result

Sr. X  Y  Resut   
 1  2  12 Positive
 2  4   3 positive


Grid size is 12 * 8

How I can calculate the nearest distance for each sample from red points (positive ones)?

Red = Positive, Blue = Negative

Sr. X  Y  Result   Nearest-distance-red 
1  2  23 Positive  ?
2  4   3 Negative  ?



Its a lot easier when there is sample data, make sure to include that next time.

I generate random data

import numpy as np
import pandas as pd
import sklearn

x = np.linspace(1,50)
y = np.linspace(1,50)

GRID = np.meshgrid(x,y)
grid_colors = 1* ( np.random.random(GRID[0].size) > .8 )
sample_data = pd.DataFrame( {'X': GRID[0].flatten(), 'Y':GRID[1].flatten(), 'grid_color' : grid_colors})

sample_data.plot.scatter(x="X",y='Y', c='grid_color', colormap='bwr', figsize=(10,10))


BallTree (or KDTree) can create a tree to query with

from sklearn.neighbors import BallTree 

red_points = sample_data[sample_data.grid_color == 1]
blue_points = sample_data[sample_data.grid_color != 1]

tree = BallTree(red_points[['X','Y']], leaf_size=15, metric='minkowski')

and use it with

distance, index = tree.query(sample_data[['X','Y']], k=1)

now add it to the DataFrame

sample_data['nearest_point_distance'] = distance
sample_data['nearest_point_X'] = red_points.X.values[index]
sample_data['nearest_point_Y'] = red_points.Y.values[index]

which gives

     X    Y  grid_color  nearest_point_distance  nearest_point_X  \
0  1.0  1.0           0                     2.0              3.0   
1  2.0  1.0           0                     1.0              3.0   
2  3.0  1.0           1                     0.0              3.0   
3  4.0  1.0           0                     1.0              3.0   
4  5.0  1.0           1                     0.0              5.0   

0              1.0  
1              1.0  
2              1.0  
3              1.0  
4              1.0  

Modification to have red point not find themself;

Find the nearest k=2 instead of k=1;

distance, index = tree.query(sample_data[['X','Y']], k=2)

And, with help of numpy indexing, make red points use the second instead of the first found;

sample_size = GRID[0].size

sample_data['nearest_point_distance'] = distance[np.arange(sample_size),sample_data.grid_color]
sample_data['nearest_point_X'] = red_points.X.values[index[np.arange(sample_size),sample_data.grid_color]]
sample_data['nearest_point_Y'] = red_points.Y.values[index[np.arange(sample_size),sample_data.grid_color]]

The output type is the same, but due to randomness it won't agree with earlier made picture.


cKDTree for scipy can calculate that distance for you. Something along those lines should work:

df['Distance_To_Red'] = cKDTree(coordinates_of_red_points).query((df['x'], df['y']), k=1)

