Merge two data frames considering a range match between key columns

浪子不回头ぞ 提交于 2021-02-19 04:11:49

问题


I am a beginner in programming in R. I am at the moment trying to retrieve some site names from a dataframe containing the X and Y coordinates and site names and copy them into a different dataframe with specific points.

    FD <- matrix(data =c(rep(1, 500), rep(0, 500),
                     rnorm(1000, mean = 550000, sd=4000),
                     rnorm(1000, mean = 6350000, sd=20000), rep(NA, 1000)),
             ncol = 4, nrow = 1000, byrow = FALSE)
colnames(FD) <- c('Survival', 'X', 'Y', 'Site') 
FD <- as.data.frame(FD)

shpxt <- matrix(c(526654.7,526810.5 ,6309098,6309187,530405.4,530692,
                  6337699, 6338056,580432.7, 580541.9, 6380246,6380391,
                  585761.3, 585847.6, 6379665, 6379759, 584192.1, 584279.4,
                  6382358, 6382710, 583421.2, 583492.4, 6379356, 6379425,
                  532395.5, 532515.3 , 6336421, 6336587, 534694.6, 534791.2,
                  6335620, 6335740, 536749.8, 536957.5, 6337584, 6338130, 590049.6,
                  590419.4, 6372232, 6372432, 580443, 580756.5, 6386342, 6386473,
                  575263.9, 575413.7, 6380416, 6380530, 584625.1, 584753.9, 6381009,
                  6381335), ncol = 4, nrow = 13, byrow = TRUE)
sites <- c("Brandbaeltet", "Brusaa", "Granly", "Jerup Strand", "Knasborgvej",
           "Milrimvej", "Overklitten", "Oversigtsareal", "Sandmosen",
           "Strandby", "Troldkaer", "Vaagholt", "Videsletengen")
colnames(shpxt) <- c("Xmin", "Xmax", "Ymin", "Ymax")
shpxt <- as.data.frame(shpxt)
shpxt["Sites"] <- sites

My approach is using a nested for loop like this:

    tester <- function(FD, shpxt)
{ for (i in 1:nrow(FD)) for (j in 1:nrow(shpxt))         # Open Function
{ if (FD[i,2] >= shpxt[j,1] | FD[i,2] <= shpxt[j,2] &    # Open Loop
      FD[i,3] >= shpxt[j,3] | FD[i,3] <= shpxt[j,4])
{                                                        # Open Consequent
  FD[i,4]=shpxt[j,5]
  {break}
} else                                                  # Close Consequent
{FD[i,4] <- NA                                          # Open alternative
}                                                      # Close alternative
}                                                      # Close loop
}                                                      # Close function

tester(FD, shpxt)

In essence I want to search for which site the X and Y coordinates in FD fall into range and copy the sitename into FD$Site in row i. When I run the loop on my real data I get the following error message:

test(FD, shpxt)
Error in if (FD[i, 2] >= shpxt[j, 1] | FD[i, 2] <= shpxt[j, 2] & FD[i,  : 
  missing value where TRUE/FALSE needed

How do I get the loop to go from here to where the loop will be copying the desired sitename into my FD?

Kind Regards Thøger


回答1:


You want to merge two data frames considering a range match between key columns. Here are two solutions.

using sqldf

library(sqldf)

output <- sqldf("select * from FD left join shpxt 
                on (FD.X >= shpxt.Xmin and FD.X <= shpxt.Xmax and
                    FD.Y >= shpxt.Ymin and FD.Y <= shpxt.Ymax ) ")

using data.table

library(data.table)

# convert your datasets in data.table
  setDT(FD) 
  setDT(shpxt)

output <- FD[shpxt, on = .(X >= Xmin , X <= Xmax,                # indicate x range
                           Y >= Ymin , Y <= Ymax), nomatch = NA, # indicate y range
             .(Survival, X, Y, Xmin, Xmax, Ymin, Ymax, Sites )]  # indicate columns in the output

There are different alternatives to solve this problem, as you will find it in other SO questions here and here.

ps. Keep in mind that for loop is not necessarily the best solution.




回答2:


Here's a failed attempt in base R -- perhaps someone can help correct

 getSite <- function(x, y) {
    return (shpxt[x >= shpxt['Xmin'] & x <= shpxt['Xmax'] &
                  y >= shpxt['Ymin'] & y <= shpxt['Ymax'] , "Sites"])
  }

test it

   p <- c(Survival=0, X=shpxt[2,1], Y=shpxt[2,3]) 
   getSite(p[['X']],p[['Y']])

returns correctly with

[1] "Brusaa"

However

FD$Site<-apply(FD, 1, function(point) {getSite(point[['X']], point[['Y']])})

fails with

Error in ``$<-.data.frame(tmp`, "Site", value = character(0)) : replacement has 0 rows, data has 1000



来源:https://stackoverflow.com/questions/37158839/merge-two-data-frames-considering-a-range-match-between-key-columns

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!