grepl | 易学教程

Is it possible to use an AND operator in grepl()?

阅读更多关于 Is it possible to use an AND operator in grepl()?

问题 I want to search for anything that begins with 55 and anything that has the word Roof (case-sensitive, for those who are curious) in it. So far I have been unsuccessful, as I can only seem to use the OR operator: grepl("*^55|*Roof", dataset$longname) Ultimately, I want to achieve something like this: grepl("*^55&&*Roof", dataset$longname) or grepl("*^55&*Roof", dataset$longname) (Clearly, neither of these work - they're for illustration only.) I want my results to show anything that begins

r- grepl to find multiple strings exists

阅读更多关于 r- grepl to find multiple strings exists

grepl("instance|percentage", labelTest$Text) will return true if any one of instance or percentage is present. How will i get true only when both the terms are present. Text <- c("instance", "percentage", "n", "instance percentage", "percentage instance") grepl("instance|percentage", Text) # TRUE TRUE FALSE TRUE TRUE grepl("instance.*percentage|percentage.*instance", Text) # FALSE FALSE FALSE TRUE TRUE The latter one works by looking for: ('instance')(any character sequence)('percentage') OR ('percentage')(any character sequence)('instance') Naturally if you need to find any combination of

Fast way to group variables based on direct and indirect similarities in multiple columns

阅读更多关于 Fast way to group variables based on direct and indirect similarities in multiple columns

I have a relatively large data set (1,750,000 lines, 5 columns) which contains records with unique ID values (first column), described by four criteria (4 other columns). A small example would be: # example library(data.table) dt <- data.table(id=c("a1","b3","c7","d5","e3","f4","g2","h1","i9","j6"), s1=c("a","b","c","l","l","v","v","v",NA,NA), s2=c("d","d","e","k","k","o","o","o",NA,NA), s3=c("f","g","f","n","n","s","r","u","w","z"), s4=c("h","i","j","m","m","t","t","t",NA,NA)) which looks like this: id s1 s2 s3 s4 1: a1 a d f h 2: b3 b d g i 3: c7 c e f j 4: d5 l k n m 5: e3 l k n m 6: f4 v o

grepl for dplyr sql table?

阅读更多关于 grepl for dplyr sql table?

is there a workaround to use something like filter(df, grepl("A|B|C",location)) for a dplyr SQL table? In SQL it is probalby a LIKE . Of cource I could convert the SQL table to a R data table, but it is very large. ( http://cran.r-project.org/web/packages/dplyr/vignettes/databases.html ) At the moment I get Error in sqliteSendQuery(conn, statement) : error in statement: no such function: GREPL thx Christof Using sql to translate the expression directly into sql is one option. sql_table %>% filter( sql("location LIKE 'A%' OR location LIKE 'B%' OR location LIKE 'C%'") Which will inject the

Extract URLs with regex into a new data frame column

阅读更多关于 Extract URLs with regex into a new data frame column

问题 I want to use a regex to extract all URLs from text in a dataframe, into a new column. I have some older code that I have used to extract keywords, so I'm looking to adapt the code for a regex. I want to save a regex as a string variable and apply here: data$ContentURL <- apply(sapply(regex, grepl, data$Content, fixed=FALSE), 1, function(x) paste(selection[x], collapse=',')) It seems that fixed=FALSE should tell grepl that its a regular expression, but R doesn't like how I am trying to save

Fast way to group variables based on direct and indirect similarities in multiple columns

阅读更多关于 Fast way to group variables based on direct and indirect similarities in multiple columns

问题 I have a relatively large data set (1,750,000 lines, 5 columns) which contains records with unique ID values (first column), described by four criteria (4 other columns). A small example would be: # example library(data.table) dt <- data.table(id=c("a1","b3","c7","d5","e3","f4","g2","h1","i9","j6"), s1=c("a","b","c","l","l","v","v","v",NA,NA), s2=c("d","d","e","k","k","o","o","o",NA,NA), s3=c("f","g","f","n","n","s","r","u","w","z"), s4=c("h","i","j","m","m","t","t","t",NA,NA)) which looks

How to search for a string in one column in other columns of a data frame

阅读更多关于 How to search for a string in one column in other columns of a data frame

I have a table, call it df, with 3 columns, the 1st is the title of a product, the 2nd is the description of a product, and the third is a one word string. What I need to do is run an operation on the entire table, creating 2 new columns (call them 'exists_in_title' and 'exists_in_description') that have either a 1 or 0 indicating if the 3rd column exists in either the 1st or 2nd column. I need it to simply be a 1:1 operation, so for example, calling row 1 'A', I need to check if the cell A3, exists in A1, and use that data to create column exists_in_title, and then check if A3 exists in A2,

Find matches of a vector of strings in another vector of strings

阅读更多关于 Find matches of a vector of strings in another vector of strings

I'm trying to create a subset of a data frame of news articles that mention at least one element of a set of keywords or phrases. # Sample data frame of articles articles <- data.frame(id=c(1, 2, 3, 4), text=c("Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod", "tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,", "quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo", "consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse")) articles$text <- as.character(articles$text) # Sample vector of keywords or

grepl in R to find matches to any of a list of character strings

阅读更多关于 grepl in R to find matches to any of a list of character strings

Is it possible to use a grepl argument when referring to a list of values, maybe using the %in% operator? I want to take the data below and if the animal name has "dog" or "cat" in it, I want to return a certain value, say, "keep"; if it doesn't have "dog" or "cat", I want to return "discard". data <- data.frame(animal = sample(c("cat","dog","bird", 'doggy','kittycat'), 50, replace = T)) Now, if I were just to do this by strictly matching values, say, "cat" and "dog', I could use the following approach: matches <- c("cat","dog") data$keep <- ifelse(data$animal %in% matches, "Keep", "Discard")

Filtering observations in dplyr in combination with grepl

阅读更多关于 Filtering observations in dplyr in combination with grepl

I am trying to work out how to filter some observations from a large dataset using dplyr and grepl . I am not wedded to grepl , if other solutions would be more optimal. Take this sample df: df1 <- data.frame(fruit=c("apple", "orange", "xapple", "xorange", "applexx", "orangexx", "banxana", "appxxle"), group=c("A", "B") ) df1 # fruit group #1 apple A #2 orange B #3 xapple A #4 xorange B #5 applexx A #6 orangexx B #7 banxana A #8 appxxle B I want to: filter out those cases beginning with 'x' filter out those cases ending with 'xx' I have managed to work out how to get rid of everything that