问题
I am working with soccer ball and soccer player tracking data. I am trying to find the player that is closest to the ball for each row of coordinate points, and make a new column attributing the closest player to the ball
example data
| ball_point | home_player1_point | home_player2_point | away_player1_point |
| -------- | -------------- | ----------------------------------
| (7.00,3.00) (-15.37,8.22) (25.3,-.2) (12.0,12.9)
desired output
| ball_point | home_player1_point | home_player2_point | away_player1_point | closest
| -------- | -------------- | ----------------------------------
| (7.00,3.00) (-15.37,8.22) (25.3,-.2) (7.1,3.2) away_player1
Here is a link to my working notebook : https://github.com/piercepatrick/Articles_EDA/blob/main/nashSCProject.ipynb The work pertaining to this problem can be found at the bottom, although it is messy right now. I have also used this question to help me: Find closest point in Pandas DataFrames
Any help appreciated, I need this done by tonight!
回答1:
I'm assuming your dataframe has more rows. First you need to define some functions: a function of distance between two points (I'll use euclidean distance) and a function to get the distance between point in two pandas.Series
(or dataframe columns):
def euc_dist(x,y):
return ((x[0] - y[0])**2 +(x[1] - y[1])**2 )**(1/2)
def dist(s1,s2):
distances = [euc_dist(s1[i],s2[i]) for i in range(s1.shape[0])]
return pd.Series(distances)
The returning value of dist
has to be a pandas.Series
because it has to be a new column (I'm assuming your dataframe is called df
):
distances_df = df.iloc[:,1:].apply(dist, args = (df["ball_point"],))
df["closest"] = distances_df.idxmin(axis = 1).apply(lambda x: str(x)[:-6])
The function dist
is applied from the second column onward, that's why I use df.iloc[:,1:]
and they're allcompared with the "ball_position" column, that's why it is in the args
parameter, which has to be a tuple
.
Then you can find the column with the minimum distance using DataFrame.idxmin
. The lambda function is only to get "away_player1"
instead of "away_player1_point"
in the example.
Printing distances_df
and df
gives:
#distances_df
home_player1_point home_player2_point away_player1_point
0 22.970966 18.577675 11.090987
#df
ball_point home_player1_point home_player2_point away_player1_point closest
0 (7, 3) (-15.37, 8.22) (25.3, -0.2) (12.0, 12.9) away_player1
来源:https://stackoverflow.com/questions/65272337/pandas-dataframe-find-the-column-with-the-closest-coordinate-point-to-another-c