Pandas: calculate haversine distance within each group of rows

后端未结

关注

 4  881

自闭症患者 2020-12-18 14:26

The sample CSV is like this:

 user_id  lat         lon
    1   19.111841   72.910729
    1   19.111342   72.908387
    2   19.111542   72.907387
    2   19.1


      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   生来不讨喜
                                             
                
                
                (楼主)
            
              
              
                2020-12-18 14:45
              

            
            
                        
Assuming that you want to compute haversine() with the first element in each user id group against all the other entries in a group, this approach will work:  

# copying example data from OP
import pandas as pd
df = pd.read_clipboard() # alternately, df = pd.read_csv(filename)

def haversine_wrapper(row):
    # return None when both lon/lat pairs are the same
    if (row['first_lon'] == row['lon']) & (row['first_lat'] == row['lat']):
        return None
    return haversine(row['first_lon'], row['first_lat'], row['lon'], row['lat'])

df['result'] = (df.merge(df.groupby('user_id', as_index=False)
                           .agg({'lat':'first','lon':'first'})
                           .rename(columns={'lat':'first_lat','lon':'first_lon'}), 
                         on='user_id')
                  .apply(haversine_wrapper, axis='columns'))

print(df)


Output:

user_id        lat        lon     result
 0    1  19.111841  72.910729        NaN
 1    1  19.111342  72.908387   0.252243
 2    2  19.111542  72.907387        NaN
 3    2  19.137815  72.914085   3.004976
 4    2  19.119677  72.905081   0.936454
 5    2  19.129677  72.905081   2.031021
 6    3  19.319677  72.905081        NaN
 7    3  19.120217  72.907121  22.179974
 8    4  19.420217  72.807121        NaN
 9    4  19.520217  73.307121  53.584504
 10   5  19.319677  72.905081        NaN
 11   5  19.419677  72.805081  15.286775
 12   5  19.629677  72.705081  40.346128
 13   5  19.111860  72.911347  23.117560
 14   5  19.111860  72.931346  23.272178
 15   5  19.219677  72.605081  33.395165
 16   6  19.319677  72.805082        NaN
 17   6  19.419677  72.905086  15.287063

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复