gsub | 易学教程

In R, how do I replace a string that contains a certain pattern with another string?

阅读更多关于 In R, how do I replace a string that contains a certain pattern with another string?

问题 I'm working on a project involving cleaning a list of data on college majors. I find that a lot are misspelled, so I was looking to use the function gsub() to replace the misspelled ones with its correct spelling. For example, say 'biolgy' is misspelled in a list of majors called Major. How can I get R to detect the misspelling and replace it with its correct spelling? I've tried gsub('biol', 'Biology', Major) but that only replaces the first four letters in 'biolgy'. If I do gsub('biolgy',

removing trailing spaces with gsub in R [duplicate]

阅读更多关于 removing trailing spaces with gsub in R [duplicate]

This question already has answers here : How to trim leading and trailing whitespace? (13 answers) Does anyone have a trick to remove trailing spaces on variables with gsub? Below is a sample of my data. As you can see, I have both trailing spaces and spaces embedded in the variable. county <- c("mississippi ","mississippi canyon","missoula ", "mitchell ","mobile ", "mobile bay") I can use the following logic to remove all spaces, but what I really want is to only move the spaces at the end. county2 <- gsub(" ","",county) Any assistance would be greatly appreciated. You could use an regular

R regex gsub separate letters and numbers

阅读更多关于 R regex gsub separate letters and numbers

I have a string that's mixed letters and numbers: "The sample is 22mg" I'd like to split strings where a number is immediately followed by letter like this: "The sample is 22 mg" I've tried this: gsub('[0-9]+[[aA-zZ]]', '[0-9]+ [[aA-zZ]]', 'This is a test 22mg') but am not getting the desired results. Any suggestions? You need to use capturing parentheses in the regular expression and group references in the replacement. For example: gsub('([0-9])([[:alpha:]])', '\\1 \\2', 'This is a test 22mg') There's nothing R-specific here; the R help for regex and gsub should be of some use. You need

How can I use back references with `grep` in R?

阅读更多关于 How can I use back references with `grep` in R?

问题 I am looking for an elegant way of returning back references using regular expressions in R. Le me explain: Let's say I want to find strings that start with a month name: x <- c("May, 1, 2011", "30 June 2011") grep("May|^June", x, value=TRUE) [1] "May, 1, 2011" This works, but I really want to isolate the month (i.e. "May", not the entire matched string. So, one can use gsub to return the back reference using the substitute parameter. But this has two problems: You have to wrap the pattern

Add space between two letters in a string in R [duplicate]

阅读更多关于 Add space between two letters in a string in R [duplicate]

This question already has an answer here: Use regex to insert space between collapsed words 2 answers Suppose I have a string like s = "PleaseAddSpacesBetweenTheseWords" How do I use gsub in R add a space between the words so that I get "Please Add Spaces Between These Words" I should do something like gsub("[a-z][A-Z]", ???, s) What do I put for ???. Also, I find the regular expression documentation for R confusing so a reference or writeup on regular expressions in R would be much appreciated. You just need to capture the matches then use the \1 syntax to refer to the captured matches. For

using negative conditions within regular expressions

阅读更多关于 using negative conditions within regular expressions

Is it possible to use negative matches within gsub expressions? I want to replace strings starting by hello except those starting by hello Peter my-string.gsub(/^hello@/i, '') What should I put instead of the @ ? Sounds like you want a negative lookahead: >> "hello foo".gsub(/hello (?!peter)/, 'lala ') #=> "lala foo" >> "hello peter".gsub(/hello (?!peter)/, 'lala ') #=> "hello peter" As Michael told you you need a negative lookahead. For your example is something like: my_string.gsub(/^hello(?! peter)( .*|$)/i, '') This will replace in cases like: "hello" "hello Mom" "hello " "hello Mom and

R: combine several gsub() function in a pipe

阅读更多关于 R: combine several gsub() function in a pipe

To clean some messy data I would like to start using pipes %>% , but I fail to get the R code working if gsub() is not at the beginning of the pipe, should occur late (Note: this question is not concerned with proper import, but with data cleaning). Simple example: df <- cbind.data.frame(A= c("2.187,78 ", "5.491,28 ", "7.000,32 "), B = c("A","B","C")) Column A contains characters (in this case numbers, but this also could be string) and need to be cleaned. The steps are df$D <- gsub("\\.","",df$A) df$D <- str_trim(df$D) df$D <- as.numeric(gsub(",", ".",df$D)) One easily could pipe this df$D <-

Split strings at the first colon

阅读更多关于 Split strings at the first colon

I am reading data files in text format using readLines . The first 'column' is complicated text that I do not need. The next columns contain data that I do need. The first 'column' and the data are separated by a colon (:). I wish to split each row at the first colon and delete the resulting text string, keeping only the data. Below is an example data file. One potential complication is that one line of data contains multiple colons. That line may at some point become my header. So, I probably should not split at every colon, just at the first colon. my.data <- "first string of text..: aa : bb

How can I use back references with `grep` in R?

阅读更多关于 How can I use back references with `grep` in R?

I am looking for an elegant way of returning back references using regular expressions in R. Le me explain: Let's say I want to find strings that start with a month name: x <- c("May, 1, 2011", "30 June 2011") grep("May|^June", x, value=TRUE) [1] "May, 1, 2011" This works, but I really want to isolate the month (i.e. "May", not the entire matched string. So, one can use gsub to return the back reference using the substitute parameter. But this has two problems: You have to wrap the pattern inside ".*(pattern).*)" so that the substitution occurs on the entire string. Rather than returning NA

Ruby regex what does the \\1 mean for gsub

阅读更多关于 Ruby regex what does the \\1 mean for gsub

What does the \1 do? For example "foo bar bag".gsub(/(bar)/,'car\1') I believe it has something to do with how you use parentheses, but I'm not really sure. Could someone explain it to me? And can you do stuff like \2? If so, what would that do? James Toomey Each item that you surround with parenthesis in the searching part will correspond to a number \1 , \2 , etc., in the substitution part. In your example, there's only one item surrounded by parenthesis, the "(bar)" item, so anywhere you put a \1 is where the part inside the parenthesis, will be swapped in. You can put in the \1 multiple