The sample CSV is like this:
user_id lat lon
1 19.111841 72.910729
1 19.111342 72.908387
2 19.111542 72.907387
2 19.1
Assuming that you want to compute haversine() with the first element in each user id group against all the other entries in a group, this approach will work:
# copying example data from OP
import pandas as pd
df = pd.read_clipboard() # alternately, df = pd.read_csv(filename)
def haversine_wrapper(row):
# return None when both lon/lat pairs are the same
if (row['first_lon'] == row['lon']) & (row['first_lat'] == row['lat']):
return None
return haversine(row['first_lon'], row['first_lat'], row['lon'], row['lat'])
df['result'] = (df.merge(df.groupby('user_id', as_index=False)
.agg({'lat':'first','lon':'first'})
.rename(columns={'lat':'first_lat','lon':'first_lon'}),
on='user_id')
.apply(haversine_wrapper, axis='columns'))
print(df)
Output:
user_id lat lon result
0 1 19.111841 72.910729 NaN
1 1 19.111342 72.908387 0.252243
2 2 19.111542 72.907387 NaN
3 2 19.137815 72.914085 3.004976
4 2 19.119677 72.905081 0.936454
5 2 19.129677 72.905081 2.031021
6 3 19.319677 72.905081 NaN
7 3 19.120217 72.907121 22.179974
8 4 19.420217 72.807121 NaN
9 4 19.520217 73.307121 53.584504
10 5 19.319677 72.905081 NaN
11 5 19.419677 72.805081 15.286775
12 5 19.629677 72.705081 40.346128
13 5 19.111860 72.911347 23.117560
14 5 19.111860 72.931346 23.272178
15 5 19.219677 72.605081 33.395165
16 6 19.319677 72.805082 NaN
17 6 19.419677 72.905086 15.287063