Geographical distance by group - Applying a function on each pair of rows

前端未结

关注

 7  895

清歌不尽 2020-12-21 05:04

I want to calculate the average geographical distance between a number of houses per province.

Suppose I have the following data.

df1 <- data.fram


      
      
        
          7条回答        

        
                    
            
            
                         
                
              
              
                
                   再見小時候
                                             
                
                
                (楼主)
            
              
              
                2020-12-21 05:37
              

            
            
                        
In reference to this thread, the vectorized solution for your problem would be like below;

toCheck <- sapply(split(df1, df1$province), function(x){
                                            combn(rownames(x), 2, simplify = FALSE)})

names(toCheck) <- sapply(toCheck, paste, collapse = " - ")


sapply(toCheck, function(x){
               distm(df1[x[1],c("lon","lat")], df1[x[2],c("lon","lat")], 
                     fun = distHaversine)
                           })


  #    1 - 2      1 - 3      2 - 3      4 - 5      4 - 6      5 - 6 
  # 11429.10   22415.04   12293.48  634549.20 1188925.65  557361.28 


This works if number of records for each province is the same. If that's not the case, then the second part for assigning the appropriate names to toCheck and how we use it at the end should be changed as the structure of the toCheck list changes. It does not care about the order of dataset though.



for your actual dataset, toCheck will become a nested list, so you need to tweak the function like below; I have not made toCheck names clean for this solution. (df2 can be found at the end of answer).

df2 <- df2[order(df2$province),] #sorting may even improve performance
names(toCheck) <- paste("province", unique(df2$province))

toCheck <- sapply(split(df2, df2$province), function(x){
                                            combn(rownames(x), 2, simplify = FALSE)})

sapply(toCheck, function(x){ sapply(x, function(y){
  distm(df2[y[1],c("lon","lat")], df2[y[2],c("lon","lat")], fun = distHaversine)
})})

# $`province 1`
# [1]   11429.10   22415.04 1001964.84   12293.48 1013117.36 1024209.46
# 
# $`province 2`
# [1]  634549.2 1188925.7  557361.3
# 
# $`province 3`
# [1] 590083.2
# 
# $`province 4`
# [1] 557361.28 547589.19  11163.92


You can further get the mean() for each province. Also, if you need to, it should not be hard to rename elements of nested lists so you can tell each distance corresponds to what houses.

df2 <- data.frame(province = c(1, 1, 1, 2, 2, 2, 1, 3, 3, 4,4,4),
                  house = c(1, 2, 3, 4, 5, 6, 7, 10, 9, 8, 11, 12),
                  lat = c(-76.6, -76.5, -76.4, -75.4, -80.9, -85.7, -85.6, -76.4, -75.4, -80.9, -85.7, -85.6), 
                  lon = c(39.2, 39.1, 39.3, 60.8, 53.3, 40.2, 40.1, 39.3, 60.8, 53.3, 40.2, 40.1))

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它7个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复