Find matches of a vector of strings in another vector of strings

与世无争的帅哥 提交于 2019-11-27 03:17:38

问题


I'm trying to create a subset of a data frame of news articles that mention at least one element of a set of keywords or phrases.

# Sample data frame of articles
articles <- data.frame(id=c(1, 2, 3, 4), text=c("Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod", "tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,", "quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo", "consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse"))
articles$text <- as.character(articles$text)

# Sample vector of keywords or phrases
keywords <- as.character(c("elit", "tempor incididunt", "reprehenderit"))

#   id                                                                         text
# 1  1     Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
# 2  2 tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
# 3  3      quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
# 4  4    consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse

Given the vector of keywords, the subset should contain rows 1, 2, and 4, since those rows contain one or more of the elements of the vector.

Neither %in nor grepl() work, since %in% seems to require that each word in the data frame be vectorized (articles$text %in% keywords results in four FALSEs), and grep() doesn't seem to be able to handle vectorized patterns (grep(keywords, articles$text) gives an error). Neither function alone seems to work well across multiple dimensions (i.e. it would be easy to search for one word in all the rows, but not all 3 at the same time).

What's the best way to find and select all rows of the data frame that contain at least one of the elements of the keyword vector?


回答1:


You can try pasting your "keywords" together and separate them with the pipe character (|) which will work like an "or", like this:

> articles[grepl(paste(keywords, collapse="|"), articles$text),]
  id                                                                         text
1  1     Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
2  2 tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
4  4    consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse


来源:https://stackoverflow.com/questions/17130129/find-matches-of-a-vector-of-strings-in-another-vector-of-strings

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!