apply regexp in one data frame based on the column in another data frame

China☆狼群 提交于 2019-12-11 10:17:36

问题


I have two data frames --- table A is the pattern table, and table B is the name table. I want to subset table B, where it matches the pattern in table a.

A <- data.frame(pattern = c("aa", "bb", "cc", "dd"))
B <- data.frame(name = "aa1", "bb1", "abc", "def" ,"ddd")

I'm trying to do a for loop looks like:

for (i in 1:nrow(A)){
for (j in 1:nrow(B)){
DT <- data.frame(grep(A$pattern[i], B$name[j], ignore.case = T, value = T))
}}

And I want my resulting table DTto only contains aa1, bb1, and ddd

But it's super slow. I just wondering if there's any more efficient way to do it? Many thans!


回答1:


There is no need for a double loop, the following uses only a sapply loop.

inx <- unlist(sapply(A$pattern, grep, B$name))
B[inx, , drop = FALSE]
#  name
#1  aa1
#2  bb1
#5  ddd



回答2:


it appears there's a slight error in your sample input data (missing B$name is not properly declared and need to include stringsAsFactors = F for both data.frame objects):

> A <- data.frame(pattern = c("aa", "bb", "cc", "dd"), stringsAsFactors = F)
> B <- data.frame(name = c("aa1", "bb1", "abc", "def" ,"ddd"), stringsAsFactors = F)

CODE

# using sapply with grepl
> indices <- sapply(1:nrow(A), function(z) grepl(A$pattern[z], B$name[z]))
> indices
[1]  TRUE  TRUE FALSE FALSE

> B[indices, ]
[1] "aa1" "bb1" "ddd"


来源:https://stackoverflow.com/questions/52372881/apply-regexp-in-one-data-frame-based-on-the-column-in-another-data-frame

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!