grepl

How to search for a string in one column in other columns of a data frame

旧街凉风 提交于 2019-11-27 18:26:05
问题 I have a table, call it df, with 3 columns, the 1st is the title of a product, the 2nd is the description of a product, and the third is a one word string. What I need to do is run an operation on the entire table, creating 2 new columns (call them 'exists_in_title' and 'exists_in_description') that have either a 1 or 0 indicating if the 3rd column exists in either the 1st or 2nd column. I need it to simply be a 1:1 operation, so for example, calling row 1 'A', I need to check if the cell A3,

POSIX character class does not work in base R regex

余生长醉 提交于 2019-11-27 05:38:06
I'm having some problems matching a pattern with a string of text in R . I'm trying to get TRUE with grepl when the text is something like "lettersornumbersorspaces y lettersornumbersorspaces". I'm using the following regex : ([:alnum:]|[:blank:])+[:blank:][yY][:blank:]([:alnum:]|[:blank:])+ When using the regex as follows to obtain the "address" it works at expected. regex <- "([:alnum:]|[:blank:])+[:blank:][yY][:blank:]([:alnum:]|[:blank:])+" address <- str_extract(fulltext, regex) I see that address is the text that I need. Now, if I want to use grepl to get a TRUE as follows: grepl("([

Filtering observations in dplyr in combination with grepl

时光毁灭记忆、已成空白 提交于 2019-11-27 05:19:09
问题 I am trying to work out how to filter some observations from a large dataset using dplyr and grepl . I am not wedded to grepl , if other solutions would be more optimal. Take this sample df: df1 <- data.frame(fruit=c("apple", "orange", "xapple", "xorange", "applexx", "orangexx", "banxana", "appxxle"), group=c("A", "B") ) df1 # fruit group #1 apple A #2 orange B #3 xapple A #4 xorange B #5 applexx A #6 orangexx B #7 banxana A #8 appxxle B I want to: filter out those cases beginning with 'x'

Find matches of a vector of strings in another vector of strings

与世无争的帅哥 提交于 2019-11-27 03:17:38
问题 I'm trying to create a subset of a data frame of news articles that mention at least one element of a set of keywords or phrases. # Sample data frame of articles articles <- data.frame(id=c(1, 2, 3, 4), text=c("Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod", "tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,", "quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo", "consequat. Duis aute irure dolor in reprehenderit in

Use grepl to search either of multiple substrings in a text [duplicate]

放肆的年华 提交于 2019-11-27 02:04:57
This question already has an answer here: Matching multiple patterns 6 answers I am using grepl() in R to search if either of the following Genres exist in my text. I am doing it like this right now: grepl("Action", my_text) | grepl("Adventure", my_text) | grepl("Animation", my_text) | grepl("Biography", my_text) | grepl("Comedy", my_text) | grepl("Crime", my_text) | grepl("Documentary", my_text) | grepl("Drama", my_text) | grepl("Family", my_text) | grepl("Fantasy", my_text) | grepl("Film-Noir", my_text) | grepl("History", my_text) | grepl("Horror", my_text) | grepl("Music", my_text) | grepl(

Complete word matching using grepl in R

我只是一个虾纸丫 提交于 2019-11-27 01:57:59
Consider the following example: > testLines <- c("I don't want to match this","This is what I want to match") > grepl('is',testLines) > [1] TRUE TRUE What I want, though, is to only match 'is' when it stands alone as a single word. From reading a bit of perl documentation, it seemed that the way to do this is with \b, an anchor that can be used to identify what comes before and after the patter, i.e. \bword\b matches 'word' but not 'sword'. So I tried the following example, with use of Perl syntax set to 'TRUE': > grepl('\bis\b',testLines,perl=TRUE) > [1] FALSE FALSE The output I'm looking for

grepl in R to find matches to any of a list of character strings

丶灬走出姿态 提交于 2019-11-27 01:39:02
问题 Is it possible to use a grepl argument when referring to a list of values, maybe using the %in% operator? I want to take the data below and if the animal name has "dog" or "cat" in it, I want to return a certain value, say, "keep"; if it doesn't have "dog" or "cat", I want to return "discard". data <- data.frame(animal = sample(c("cat","dog","bird", 'doggy','kittycat'), 50, replace = T)) Now, if I were just to do this by strictly matching values, say, "cat" and "dog', I could use the

Complete word matching using grepl in R

半世苍凉 提交于 2019-11-26 09:52:01
问题 Consider the following example: > testLines <- c(\"I don\'t want to match this\",\"This is what I want to match\") > grepl(\'is\',testLines) > [1] TRUE TRUE What I want, though, is to only match \'is\' when it stands alone as a single word. From reading a bit of perl documentation, it seemed that the way to do this is with \\b, an anchor that can be used to identify what comes before and after the patter, i.e. \\bword\\b matches \'word\' but not \'sword\'. So I tried the following example,

Use grepl to search either of multiple substrings in a text [duplicate]

浪尽此生 提交于 2019-11-26 08:29:33
问题 This question already has an answer here: Matching multiple patterns 6 answers I am using grepl() in R to search if either of the following Genres exist in my text. I am doing it like this right now: grepl(\"Action\", my_text) | grepl(\"Adventure\", my_text) | grepl(\"Animation\", my_text) | grepl(\"Biography\", my_text) | grepl(\"Comedy\", my_text) | grepl(\"Crime\", my_text) | grepl(\"Documentary\", my_text) | grepl(\"Drama\", my_text) | grepl(\"Family\", my_text) | grepl(\"Fantasy\", my