问题
I have two data frames --- table A is the pattern table, and table B is the name table. I want to subset table B, where it matches the pattern in table a.
A <- data.frame(pattern = c("aa", "bb", "cc", "dd"))
B <- data.frame(name = "aa1", "bb1", "abc", "def" ,"ddd")
I'm trying to do a for loop looks like:
for (i in 1:nrow(A)){
for (j in 1:nrow(B)){
DT <- data.frame(grep(A$pattern[i], B$name[j], ignore.case = T, value = T))
}}
And I want my resulting table DT
to only contains aa1
, bb1
, and ddd
But it's super slow. I just wondering if there's any more efficient way to do it? Many thans!
回答1:
There is no need for a double loop, the following uses only a sapply
loop.
inx <- unlist(sapply(A$pattern, grep, B$name))
B[inx, , drop = FALSE]
# name
#1 aa1
#2 bb1
#5 ddd
回答2:
it appears there's a slight error in your sample input data (missing B$name
is not properly declared and need to include stringsAsFactors = F
for both data.frame
objects):
> A <- data.frame(pattern = c("aa", "bb", "cc", "dd"), stringsAsFactors = F)
> B <- data.frame(name = c("aa1", "bb1", "abc", "def" ,"ddd"), stringsAsFactors = F)
CODE
# using sapply with grepl
> indices <- sapply(1:nrow(A), function(z) grepl(A$pattern[z], B$name[z]))
> indices
[1] TRUE TRUE FALSE FALSE
> B[indices, ]
[1] "aa1" "bb1" "ddd"
来源:https://stackoverflow.com/questions/52372881/apply-regexp-in-one-data-frame-based-on-the-column-in-another-data-frame