问题
I have the following two data frames
d1 <- data.frame(chr = c("chr1","chr2","chr2"), pos = c(11, 15,21), type = c("type1","type2","type1"))
> d1
chr pos type
1 chr1 11 type1
2 chr2 15 type2
3 chr2 21 type1
d2 <- data.frame(chr = c("chr1","chr2","chr4"), start = c(10, 15,30), stop = c(13,20,40))
> d2
chr start stop
1 chr1 10 13
2 chr2 15 20
3 chr4 30 40
I want to subset d1 on two conditions:
- keep all lines where 'type' == "type1" (I know how to do this)
- keep all lines where 'chr' matches any of the lines in d2 and 'pos' falls between the 'start' and 'stop' values from that line in d2
The resulting d3 would in this case then only contain line 1 of d1:
> d3
chr pos type
1 chr1 11 type1
I would start like this:
d3 <- subset(d1, d1$type == "type1" & ...)
回答1:
We can add all the conditions together into one logical condition to subset
d1[d1$type=="type1" & d1$chr %in% d2$chr & d1$pos >= d2$start & d1$pos <= d2$stop, ]
# chr pos type
#1 chr1 11 type1
来源:https://stackoverflow.com/questions/57266240/r-subset-data-frame-check-if-value-lies-in-range