gsub

In R, how do I replace a string that contains a certain pattern with another string?

假如想象 提交于 2019-12-03 19:27:00
问题 I'm working on a project involving cleaning a list of data on college majors. I find that a lot are misspelled, so I was looking to use the function gsub() to replace the misspelled ones with its correct spelling. For example, say 'biolgy' is misspelled in a list of majors called Major. How can I get R to detect the misspelling and replace it with its correct spelling? I've tried gsub('biol', 'Biology', Major) but that only replaces the first four letters in 'biolgy'. If I do gsub('biolgy',

removing trailing spaces with gsub in R [duplicate]

可紊 提交于 2019-12-03 14:08:03
This question already has answers here : How to trim leading and trailing whitespace? (13 answers) Does anyone have a trick to remove trailing spaces on variables with gsub? Below is a sample of my data. As you can see, I have both trailing spaces and spaces embedded in the variable. county <- c("mississippi ","mississippi canyon","missoula ", "mitchell ","mobile ", "mobile bay") I can use the following logic to remove all spaces, but what I really want is to only move the spaces at the end. county2 <- gsub(" ","",county) Any assistance would be greatly appreciated. You could use an regular

R regex gsub separate letters and numbers

倾然丶 夕夏残阳落幕 提交于 2019-12-03 13:54:30
I have a string that's mixed letters and numbers: "The sample is 22mg" I'd like to split strings where a number is immediately followed by letter like this: "The sample is 22 mg" I've tried this: gsub('[0-9]+[[aA-zZ]]', '[0-9]+ [[aA-zZ]]', 'This is a test 22mg') but am not getting the desired results. Any suggestions? You need to use capturing parentheses in the regular expression and group references in the replacement. For example: gsub('([0-9])([[:alpha:]])', '\\1 \\2', 'This is a test 22mg') There's nothing R-specific here; the R help for regex and gsub should be of some use. You need

How can I use back references with `grep` in R?

与世无争的帅哥 提交于 2019-12-03 09:51:11
问题 I am looking for an elegant way of returning back references using regular expressions in R. Le me explain: Let's say I want to find strings that start with a month name: x <- c("May, 1, 2011", "30 June 2011") grep("May|^June", x, value=TRUE) [1] "May, 1, 2011" This works, but I really want to isolate the month (i.e. "May", not the entire matched string. So, one can use gsub to return the back reference using the substitute parameter. But this has two problems: You have to wrap the pattern

Add space between two letters in a string in R [duplicate]

不问归期 提交于 2019-12-03 08:42:42
This question already has an answer here: Use regex to insert space between collapsed words 2 answers Suppose I have a string like s = "PleaseAddSpacesBetweenTheseWords" How do I use gsub in R add a space between the words so that I get "Please Add Spaces Between These Words" I should do something like gsub("[a-z][A-Z]", ???, s) What do I put for ???. Also, I find the regular expression documentation for R confusing so a reference or writeup on regular expressions in R would be much appreciated. You just need to capture the matches then use the \1 syntax to refer to the captured matches. For

using negative conditions within regular expressions

懵懂的女人 提交于 2019-12-03 07:37:12
Is it possible to use negative matches within gsub expressions? I want to replace strings starting by hello except those starting by hello Peter my-string.gsub(/^hello@/i, '') What should I put instead of the @ ? Sounds like you want a negative lookahead: >> "hello foo".gsub(/hello (?!peter)/, 'lala ') #=> "lala foo" >> "hello peter".gsub(/hello (?!peter)/, 'lala ') #=> "hello peter" As Michael told you you need a negative lookahead. For your example is something like: my_string.gsub(/^hello(?! peter)( .*|$)/i, '') This will replace in cases like: "hello" "hello Mom" "hello " "hello Mom and

R: combine several gsub() function in a pipe

孤人 提交于 2019-12-03 06:44:40
To clean some messy data I would like to start using pipes %>% , but I fail to get the R code working if gsub() is not at the beginning of the pipe, should occur late (Note: this question is not concerned with proper import, but with data cleaning). Simple example: df <- cbind.data.frame(A= c("2.187,78 ", "5.491,28 ", "7.000,32 "), B = c("A","B","C")) Column A contains characters (in this case numbers, but this also could be string) and need to be cleaned. The steps are df$D <- gsub("\\.","",df$A) df$D <- str_trim(df$D) df$D <- as.numeric(gsub(",", ".",df$D)) One easily could pipe this df$D <-

Split strings at the first colon

旧时模样 提交于 2019-12-03 03:34:46
I am reading data files in text format using readLines . The first 'column' is complicated text that I do not need. The next columns contain data that I do need. The first 'column' and the data are separated by a colon (:). I wish to split each row at the first colon and delete the resulting text string, keeping only the data. Below is an example data file. One potential complication is that one line of data contains multiple colons. That line may at some point become my header. So, I probably should not split at every colon, just at the first colon. my.data <- "first string of text..: aa : bb

How can I use back references with `grep` in R?

一世执手 提交于 2019-12-03 00:31:33
I am looking for an elegant way of returning back references using regular expressions in R. Le me explain: Let's say I want to find strings that start with a month name: x <- c("May, 1, 2011", "30 June 2011") grep("May|^June", x, value=TRUE) [1] "May, 1, 2011" This works, but I really want to isolate the month (i.e. "May", not the entire matched string. So, one can use gsub to return the back reference using the substitute parameter. But this has two problems: You have to wrap the pattern inside ".*(pattern).*)" so that the substitution occurs on the entire string. Rather than returning NA

Ruby regex what does the \\1 mean for gsub

家住魔仙堡 提交于 2019-12-02 19:30:21
What does the \1 do? For example "foo bar bag".gsub(/(bar)/,'car\1') I believe it has something to do with how you use parentheses, but I'm not really sure. Could someone explain it to me? And can you do stuff like \2? If so, what would that do? James Toomey Each item that you surround with parenthesis in the searching part will correspond to a number \1 , \2 , etc., in the substitution part. In your example, there's only one item surrounded by parenthesis, the "(bar)" item, so anywhere you put a \1 is where the part inside the parenthesis, will be swapped in. You can put in the \1 multiple