grepl

Is it possible to use an AND operator in grepl()?

喜欢而已 提交于 2019-12-01 15:07:55
问题 I want to search for anything that begins with 55 and anything that has the word Roof (case-sensitive, for those who are curious) in it. So far I have been unsuccessful, as I can only seem to use the OR operator: grepl("*^55|*Roof", dataset$longname) Ultimately, I want to achieve something like this: grepl("*^55&&*Roof", dataset$longname) or grepl("*^55&*Roof", dataset$longname) (Clearly, neither of these work - they're for illustration only.) I want my results to show anything that begins

r- grepl to find multiple strings exists

本秂侑毒 提交于 2019-12-01 01:03:50
grepl("instance|percentage", labelTest$Text) will return true if any one of instance or percentage is present. How will i get true only when both the terms are present. Text <- c("instance", "percentage", "n", "instance percentage", "percentage instance") grepl("instance|percentage", Text) # TRUE TRUE FALSE TRUE TRUE grepl("instance.*percentage|percentage.*instance", Text) # FALSE FALSE FALSE TRUE TRUE The latter one works by looking for: ('instance')(any character sequence)('percentage') OR ('percentage')(any character sequence)('instance') Naturally if you need to find any combination of

Fast way to group variables based on direct and indirect similarities in multiple columns

十年热恋 提交于 2019-11-30 08:49:50
I have a relatively large data set (1,750,000 lines, 5 columns) which contains records with unique ID values (first column), described by four criteria (4 other columns). A small example would be: # example library(data.table) dt <- data.table(id=c("a1","b3","c7","d5","e3","f4","g2","h1","i9","j6"), s1=c("a","b","c","l","l","v","v","v",NA,NA), s2=c("d","d","e","k","k","o","o","o",NA,NA), s3=c("f","g","f","n","n","s","r","u","w","z"), s4=c("h","i","j","m","m","t","t","t",NA,NA)) which looks like this: id s1 s2 s3 s4 1: a1 a d f h 2: b3 b d g i 3: c7 c e f j 4: d5 l k n m 5: e3 l k n m 6: f4 v o

grepl for dplyr sql table?

好久不见. 提交于 2019-11-30 05:18:54
is there a workaround to use something like filter(df, grepl("A|B|C",location)) for a dplyr SQL table? In SQL it is probalby a LIKE . Of cource I could convert the SQL table to a R data table, but it is very large. ( http://cran.r-project.org/web/packages/dplyr/vignettes/databases.html ) At the moment I get Error in sqliteSendQuery(conn, statement) : error in statement: no such function: GREPL thx Christof Using sql to translate the expression directly into sql is one option. sql_table %>% filter( sql("location LIKE 'A%' OR location LIKE 'B%' OR location LIKE 'C%'") Which will inject the

Extract URLs with regex into a new data frame column

那年仲夏 提交于 2019-11-29 15:22:17
问题 I want to use a regex to extract all URLs from text in a dataframe, into a new column. I have some older code that I have used to extract keywords, so I'm looking to adapt the code for a regex. I want to save a regex as a string variable and apply here: data$ContentURL <- apply(sapply(regex, grepl, data$Content, fixed=FALSE), 1, function(x) paste(selection[x], collapse=',')) It seems that fixed=FALSE should tell grepl that its a regular expression, but R doesn't like how I am trying to save

Fast way to group variables based on direct and indirect similarities in multiple columns

大憨熊 提交于 2019-11-29 12:09:36
问题 I have a relatively large data set (1,750,000 lines, 5 columns) which contains records with unique ID values (first column), described by four criteria (4 other columns). A small example would be: # example library(data.table) dt <- data.table(id=c("a1","b3","c7","d5","e3","f4","g2","h1","i9","j6"), s1=c("a","b","c","l","l","v","v","v",NA,NA), s2=c("d","d","e","k","k","o","o","o",NA,NA), s3=c("f","g","f","n","n","s","r","u","w","z"), s4=c("h","i","j","m","m","t","t","t",NA,NA)) which looks

How to search for a string in one column in other columns of a data frame

半城伤御伤魂 提交于 2019-11-29 04:32:37
I have a table, call it df, with 3 columns, the 1st is the title of a product, the 2nd is the description of a product, and the third is a one word string. What I need to do is run an operation on the entire table, creating 2 new columns (call them 'exists_in_title' and 'exists_in_description') that have either a 1 or 0 indicating if the 3rd column exists in either the 1st or 2nd column. I need it to simply be a 1:1 operation, so for example, calling row 1 'A', I need to check if the cell A3, exists in A1, and use that data to create column exists_in_title, and then check if A3 exists in A2,

Find matches of a vector of strings in another vector of strings

与世无争的帅哥 提交于 2019-11-28 09:57:01
I'm trying to create a subset of a data frame of news articles that mention at least one element of a set of keywords or phrases. # Sample data frame of articles articles <- data.frame(id=c(1, 2, 3, 4), text=c("Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod", "tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,", "quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo", "consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse")) articles$text <- as.character(articles$text) # Sample vector of keywords or

grepl in R to find matches to any of a list of character strings

∥☆過路亽.° 提交于 2019-11-28 06:58:54
Is it possible to use a grepl argument when referring to a list of values, maybe using the %in% operator? I want to take the data below and if the animal name has "dog" or "cat" in it, I want to return a certain value, say, "keep"; if it doesn't have "dog" or "cat", I want to return "discard". data <- data.frame(animal = sample(c("cat","dog","bird", 'doggy','kittycat'), 50, replace = T)) Now, if I were just to do this by strictly matching values, say, "cat" and "dog', I could use the following approach: matches <- c("cat","dog") data$keep <- ifelse(data$animal %in% matches, "Keep", "Discard")

Filtering observations in dplyr in combination with grepl

余生长醉 提交于 2019-11-28 04:31:32
I am trying to work out how to filter some observations from a large dataset using dplyr and grepl . I am not wedded to grepl , if other solutions would be more optimal. Take this sample df: df1 <- data.frame(fruit=c("apple", "orange", "xapple", "xorange", "applexx", "orangexx", "banxana", "appxxle"), group=c("A", "B") ) df1 # fruit group #1 apple A #2 orange B #3 xapple A #4 xorange B #5 applexx A #6 orangexx B #7 banxana A #8 appxxle B I want to: filter out those cases beginning with 'x' filter out those cases ending with 'xx' I have managed to work out how to get rid of everything that