Merge data frame based on vector key

爱⌒轻易说出口 提交于 2021-02-19 06:06:09

问题


I'm an absolute beginner and am hoping someone will be able to help me with a merge problem that I've been stuck on for most of this evening and have thus far been unable to successfully adapt solutions to similar problems to this particular example.

I've made a dummy data frame and vector to help illustrate my problem:

dumdata <- data.frame(id=c(1:5), pcode=c(1234,9876,4477,2734,3999), vlo=c(100,450,1000,1325,1500), vhi=c(300,950,1100,1450,1700))

id pcode  vlo  vhi
 1  1234  100  300
 2  9876  450  950
 3  4477 1000 1100
 4  2734 1325 1450
 5  3999 1500 1700


vkey <- c(105,290,513,1399,1572,1683)

I would like to output a new dataframe that contains the data of dumdata in the cases where the value of vkey falls between the variables vlo and vhi. In practice, the value of vkey will always fall between a vlo-vhi range, and the ranges are always discrete.

The desired output would look like the following:

id   pcode   vlo   vhi  vkey
 1    1234   100   300   105
 1    1234   100   300   290
 2    9876   450   950   513
 4    2734  1325  1450  1399
 5    3999  1500  1700  1572
 5    3999  1500  1700  1683

回答1:


Rather than using for loops, you can construct the whole index vector in one go with sapply.

ind <- sapply(vkey, function(x) which(dumdata$vlo < x & x < dumdata$vhi))
data.frame(dumdata[ind,], vkey)

    id pcode  vlo  vhi vkey
1    1  1234  100  300  105
1.1  1  1234  100  300  290
2    2  9876  450  950  513
4    4  2734 1325 1450 1399
5    5  3999 1500 1700 1572
5.1  5  3999 1500 1700 1683

If any value in vkey matches multiple lines in dumdata it gets uglier though, as you'll need to use lapply instead of sapply and then do

data.frame(dumdata[unlist(ind),], rep(vkey, sapply(vkey, length)))

to return all matches, but I take it from the example that it is not going to happen.

Edit:

For completeness I'll add that you can use mapply too, but this is mainly intended for the case when you need to make comparisons with more than one variable (like if you had vkey1 and vkey2 that need to fullfill a condition together).

ind <- mapply(function(x, y) which(dumdata$vlo < x & y < dumdata$vhi),
              vkey1, vkey2)



回答2:


Using the data.table package.

library(data.table)

# added a blank vkeyvalue column
dumdata <- data.table(
   id=c(1:5), 
   pcode=c(1234,9876,4477,2734,3999), 
   vlo=c(100,450,1000,1325,1500), 
   vhi=c(300,950,1100,1450,1700),
   vkeyvalue = as.integer(NA)
)

#initialising the final dataset being populated with the same structure as dumdata
finalfiltereddata <- dumdata[0]
vkey <- c(105,290,513,1399,1572,1683)

# looping throug each key
for ( i in vkey)
{
#subsetting dumdata for values which meet the condition vlo < i & vhi > i
filtereddata <- dumdata[vlo < i & vhi > i]

#assigning the filtered data the respective vkeyvalue
filtereddata[, vkeyvalue := as.integer(i)]

#appending to the master data set
finalfiltereddata <- rbind(finalfiltereddata, filtereddata)
}

finalfiltereddata

   # id pcode  vlo  vhi vkeyvalue
# 1:  1  1234  100  300       105
# 2:  1  1234  100  300       290
# 3:  2  9876  450  950       513
# 4:  4  2734 1325 1450      1399
# 5:  5  3999 1500 1700      1572
# 6:  5  3999 1500 1700      1683



回答3:


One option might be to use cut to create a matching "id" column for your "vkey" variable as follows:

cutBreaks <- sort(unlist(dumdata[c("vlo", "vhi")], use.names = FALSE))
cutLabels <- rep(1:nrow(dumdata), each = 2) * c(1, -1)

new <- data.frame(vals = vkey, id = cut(vkey, breaks = cutBreaks, 
                                        labels = cutLabels[-length(cutLabels)]))
new
#   vkey id
# 1  105  1
# 2  290  1
# 3  513  2
# 4 1399  4
# 5 1572  5
# 6 1683  5

Once you have that, merge should work without a problem:

merge(new, dumdata)
#   id vkey pcode  vlo  vhi
# 1  1  105  1234  100  300
# 2  1  290  1234  100  300
# 3  2  513  9876  450  950
# 4  4 1399  2734 1325 1450
# 5  5 1572  3999 1500 1700
# 6  5 1683  3999 1500 1700


来源:https://stackoverflow.com/questions/19119022/merge-data-frame-based-on-vector-key

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!