Pandas Dataframe: Find the column with the closest coordinate point to another columns coordinate point

纵饮孤独 提交于 2021-02-11 15:23:46

问题


I am working with soccer ball and soccer player tracking data. I am trying to find the player that is closest to the ball for each row of coordinate points, and make a new column attributing the closest player to the ball

example data

| ball_point | home_player1_point | home_player2_point | away_player1_point |
| --------   | --------------     | ----------------------------------
| (7.00,3.00)     (-15.37,8.22)           (25.3,-.2)         (12.0,12.9)

desired output

| ball_point | home_player1_point | home_player2_point | away_player1_point | closest
| --------   | --------------     | ----------------------------------
| (7.00,3.00)     (-15.37,8.22)           (25.3,-.2)         (7.1,3.2)           away_player1       

Here is a link to my working notebook : https://github.com/piercepatrick/Articles_EDA/blob/main/nashSCProject.ipynb The work pertaining to this problem can be found at the bottom, although it is messy right now. I have also used this question to help me: Find closest point in Pandas DataFrames

Any help appreciated, I need this done by tonight!


回答1:


I'm assuming your dataframe has more rows. First you need to define some functions: a function of distance between two points (I'll use euclidean distance) and a function to get the distance between point in two pandas.Series (or dataframe columns):

def euc_dist(x,y):
    return ((x[0] - y[0])**2 +(x[1] - y[1])**2 )**(1/2)

def dist(s1,s2):    
    distances = [euc_dist(s1[i],s2[i]) for i in range(s1.shape[0])]
    return pd.Series(distances)

The returning value of dist has to be a pandas.Series because it has to be a new column (I'm assuming your dataframe is called df):

distances_df = df.iloc[:,1:].apply(dist, args = (df["ball_point"],))
df["closest"] = distances_df.idxmin(axis = 1).apply(lambda x: str(x)[:-6])

The function dist is applied from the second column onward, that's why I use df.iloc[:,1:] and they're allcompared with the "ball_position" column, that's why it is in the args parameter, which has to be a tuple.

Then you can find the column with the minimum distance using DataFrame.idxmin. The lambda function is only to get "away_player1" instead of "away_player1_point" in the example.

Printing distances_df and df gives:

#distances_df
   home_player1_point  home_player2_point  away_player1_point
0           22.970966           18.577675           11.090987

#df
  ball_point home_player1_point home_player2_point away_player1_point       closest
0     (7, 3)     (-15.37, 8.22)       (25.3, -0.2)       (12.0, 12.9)  away_player1


来源:https://stackoverflow.com/questions/65272337/pandas-dataframe-find-the-column-with-the-closest-coordinate-point-to-another-c

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!