Pandas: calculate haversine distance within each group of rows

后端 未结 4 881
自闭症患者
自闭症患者 2020-12-18 14:26

The sample CSV is like this:

 user_id  lat         lon
    1   19.111841   72.910729
    1   19.111342   72.908387
    2   19.111542   72.907387
    2   19.1         


        
4条回答
  •  生来不讨喜
    2020-12-18 14:45

    Assuming that you want to compute haversine() with the first element in each user id group against all the other entries in a group, this approach will work:

    # copying example data from OP
    import pandas as pd
    df = pd.read_clipboard() # alternately, df = pd.read_csv(filename)
    
    def haversine_wrapper(row):
        # return None when both lon/lat pairs are the same
        if (row['first_lon'] == row['lon']) & (row['first_lat'] == row['lat']):
            return None
        return haversine(row['first_lon'], row['first_lat'], row['lon'], row['lat'])
    
    df['result'] = (df.merge(df.groupby('user_id', as_index=False)
                               .agg({'lat':'first','lon':'first'})
                               .rename(columns={'lat':'first_lat','lon':'first_lon'}), 
                             on='user_id')
                      .apply(haversine_wrapper, axis='columns'))
    
    print(df)
    

    Output:

    user_id        lat        lon     result
     0    1  19.111841  72.910729        NaN
     1    1  19.111342  72.908387   0.252243
     2    2  19.111542  72.907387        NaN
     3    2  19.137815  72.914085   3.004976
     4    2  19.119677  72.905081   0.936454
     5    2  19.129677  72.905081   2.031021
     6    3  19.319677  72.905081        NaN
     7    3  19.120217  72.907121  22.179974
     8    4  19.420217  72.807121        NaN
     9    4  19.520217  73.307121  53.584504
     10   5  19.319677  72.905081        NaN
     11   5  19.419677  72.805081  15.286775
     12   5  19.629677  72.705081  40.346128
     13   5  19.111860  72.911347  23.117560
     14   5  19.111860  72.931346  23.272178
     15   5  19.219677  72.605081  33.395165
     16   6  19.319677  72.805082        NaN
     17   6  19.419677  72.905086  15.287063
    

提交回复
热议问题