How to assign several names to lat-lon observations

≡放荡痞女 提交于 2019-12-02 14:44:46

问题


I have two dataframes: df1 contains observations with lat-lon coordinates; df2 has names with lat-lon coordinates. I want to create a new variable df1$names which has for each observation the names of df2 that are within a specified distance to that observation.

Some sample data for df1:

df1 <- structure(list(lat = c(52.768, 53.155, 53.238, 53.253, 53.312, 53.21, 53.21, 53.109, 53.376, 53.317, 52.972, 53.337, 53.208, 53.278, 53.316, 53.288, 53.341, 52.945, 53.317, 53.249), lon = c(6.873, 6.82, 6.81, 6.82, 6.84, 6.748, 6.743, 6.855, 6.742, 6.808, 6.588, 6.743, 6.752, 6.845, 6.638, 6.872, 6.713, 6.57, 6.735, 6.917), cat = c(2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 3L, 2L, 2L, 2L, 2L, 2L), diff = c(6.97305555555555, 3.39815972222222, 14.2874305555556, -0.759791666666667, 34.448275462963, 4.38783564814815, 0.142430555555556, 0.698599537037037, 1.22914351851852, 7.0008912037037, 1.3349537037037, 8.67978009259259, 1.6090162037037,    25.9466782407407, 9.45068287037037, 4.76284722222222, 1.79163194444444, 16.8280787037037, 1.01336805555556, 3.51240740740741)), .Names = c("lat", "lon", "cat", "diff"), row.names = c(125L, 705L, 435L, 682L, 186L, 783L, 250L, 517L, 547L, 369L, 618L, 280L, 839L, 614L, 371L, 786L, 542L, 100L, 667L, 785L), class = "data.frame")

Some sample data for df2:

df2 <- structure(list(latlonloc = structure(c(6L, 3L, 4L, 2L, 5L, 1L), .Label = c("Boelenslaan", "Borgercompagnie", "Froombosch", "Garrelsweer", "Stitswerd", "Tinallinge"), class = "factor"), lat = c(53.356789, 53.193886, 53.311237, 53.111339, 53.360848, 53.162031), lon = c(6.53493, 6.780792, 6.768608, 6.82354, 6.599604, 6.143804)), .Names = c("latlonloc", "lat", "lon"), class = "data.frame", row.names = c(NA, -6L))

Creating a distance matrix with the geosphere package:

library(geosphere)
mat <- distm(df1[,c('lon','lat')], df2[,c('lon','lat')], fun=distHaversine)

The resulting distances are in meters (at least I think they are, else something is wrong with the distance matrix).

The specified distance is calculated with (df1$cat)^2)*1000. I tried df1$names <- df2$latlonloc[apply(distmat, 1, which(distmat < ((df1$cat)^2)*1000 ))], but get an error message:

Error in match.fun(FUN) : 
  'which(distmat < ((df1$cat)^2) * 1000)' is not a function, character or symbol

This is probably not the correct appraoch, but what I need is this:

df1$names <- #code or function which gives me a string of names which are within a specified distance of the observation

How can I create a string with the names that are within a specified distance of the observations?


回答1:


You need to operate on each row of df1 (or mat) in order to figure out, for each row how far away each object in df2 is. From that, you can pick the ones that meet your distance criterion.

I think you're getting a little confused about the use of apply and about the use of which. To really have which work for you, you need to apply it to each row of mat whereas your current code applies it to the entire mat matrix. Also note that it is hard to use apply here because you're comparing each row of mat against a corresponding element of a vector defined by ((df1$cat)^2)*1000). So, I will instead show you examples using sapply and lapply. You could also use mapply here, but I think the sapply/mapply syntax is clearer.

To address your desired output, I show two examples. One returns a list containing, for each row in df1, the names of items in df2 that are within the distance threshold. This won't easily go back into your original df1 as a variable because each element in the list can contain multiple names. The second example pastes those names together as a single comma-separated character string in order to create the new variable you're looking for.

Example 1:

out1 <- lapply(1:nrow(df1), function(x) {
    df2[which(mat[x,] < (((df1$cat)^2)*1000)[x]),'latlonloc']
})

Result:

> str(out1)
List of 20
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 2
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 4
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 6 4 5
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 4
 $ : Factor w/ 6 levels "Boelenslaan",..: 

Example 2:

out2 <- sapply(1:nrow(df1), function(x) {
    paste(df2[which(mat[x,] < (((df1$cat)^2)*1000)[x]),'latlonloc'], collapse=',')
})

Result:

> out2
 [1] ""                                 ""                                
 [3] ""                                 ""                                
 [5] ""                                 ""                                
 [7] ""                                 "Borgercompagnie"                 
 [9] ""                                 "Garrelsweer"                     
[11] ""                                 ""                                
[13] ""                                 ""                                
[15] "Tinallinge,Garrelsweer,Stitswerd" ""                                
[17] ""                                 ""                                
[19] "Garrelsweer"                      ""

I think the second of these is probably closest to what you're going for.



来源:https://stackoverflow.com/questions/21971814/how-to-assign-several-names-to-lat-lon-observations

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!