How to perform complex multicolumn match in R /

。_饼干妹妹 提交于 2019-12-05 16:50:13

I would go with a simple sapply and a "anded" condition in the df1$x selection like this:

df2$x <- sapply( df2$age, function(x) { df1$x[ x >= df1$lower & x <= df1$upper ] })

which gives:

> df2
  age  x
1  12 67
2  14 67
3   5 45
4   2 12
5   9 45
6  19 89
7  22 10
8  18 89
9  23 10

For age 12 for example the selection inside the brackets gives:

> 12 >= df1$lower & 12 <= df1$upper
[1] FALSE FALSE  TRUE FALSE FALSE

So getting df1$x by this logical vector is easy as your ranges don't overlap

Using foverlaps from data.table is what you are looking for:

library(data.table)
setDT(df1)
setDT(df2)[,age2:=age]
setkey(df1,lower,upper)
foverlaps(df2, df1, by.x = names(df2),by.y=c("lower","upper"))[,list(age,x)]

#    age  x
# 1:  12 67
# 2:  14 67
# 3:   5 45
# 4:   2 12
# 5:   9 45
# 6:  19 89
# 7:  22 10
# 8:  18 89
# 9:  23 10

Here's another vectorized approach using findInterval on a melted data set

library(data.table) 
df2$x <- melt(setDT(df1), "x")[order(value), x[findInterval(df2$age, value)]]
#   age  x
# 1  12 67
# 2  14 67
# 3   5 45
# 4   2 12
# 5   9 45
# 6  19 89
# 7  22 10
# 8  18 89
# 9  23 10

The idea here is to

  • First, tidy up you data so lower and upper will be in the same column and x will have corresponding values to that new column,
  • Then, sort the data according to these ranges (necessary for findInterval).
  • Finally, run findInterval within the x column in order to find the correct incidences

And here's a possible dplyr/tidyr version

library(tidyr)
library(dplyr)
df1 %>%
  gather(variable, value, -x) %>%
  arrange(value) %>%
  do(data.frame(x = .$x[findInterval(df2$age, .$value)])) %>%
  cbind(df2, .)
#   age  x
# 1  12 67
# 2  14 67
# 3   5 45
# 4   2 12
# 5   9 45
# 6  19 89
# 7  22 10
# 8  18 89
# 9  23 10
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!