gsub | 易学教程

Ruby regex what does the \1 mean for gsub

阅读更多关于 Ruby regex what does the \1 mean for gsub

问题 What does the \1 do? For example "foo bar bag".gsub(/(bar)/,'car\1') I believe it has something to do with how you use parentheses, but I'm not really sure. Could someone explain it to me? And can you do stuff like \2? If so, what would that do? 回答1: Each item that you surround with parenthesis in the searching part will correspond to a number \1 , \2 , etc., in the substitution part. In your example, there's only one item surrounded by parenthesis, the "(bar)" item, so anywhere you put a \1

Cleaning HTML code in R: how to clean this list?

阅读更多关于 Cleaning HTML code in R: how to clean this list?

问题 I know that this question has been asked here tons of times but after reading a bunch of topics I'm still stucked on this :( . I've a list of scraped html nodes like this <a href="http://bit.d o/bnRinN9" target="_blank" style="color: #ff7700; font-weight: bold;">http://bit.d o/bnRinN9</a> and I just want to clean all code part. Unfortunately I'm a newbie and the only thing it comes to my mind is the Cthulhu way (regex, argh!). Which way I can do this? *I put a space between "d" and "o" in

Remove all text before first occurence of specific characeter in R

阅读更多关于 Remove all text before first occurence of specific characeter in R

问题 Look at following vector: x <- c("MED - This means medic - somecode123", "HIV" - This means HIV -somecode456") Now I want the vector: containing the values This means medic - somecode123` This means HIV - somecode1456 I seem not able to solve this using gsub ... 回答1: We can use sub . Match the pattern of one or more non-white space ( \\S+ ) followed by one or more white space ( \\s+ ) followed by - and white space ( \\s+ ) and replace it with "" . sub('\\S+\\s+-\\s+', "", x) #[1] "This means

Reformarring complex factor vector with comma separation after thousand

阅读更多关于 Reformarring complex factor vector with comma separation after thousand

问题 I would like to reformat a factor vector so the figures that it contains have a thousand separator. The vector contains integer and real number without any particular rule with respect to the values or order. Data In particular, I'm working with a vector vec similar to the one generated below: content <- c("0 - 100", "0 - 100", "0 - 100", "0 - 100", "150.22 - 170.33", "1000 - 2000","1000 - 2000", "1000 - 2000", "1000 - 2000", "7000 - 10000", "7000 - 10000", "7000 - 10000", "7000 - 10000",

Matching entire string in R

阅读更多关于 Matching entire string in R

问题 Consider the following string: string = "I have #1 file and #11 folders" I would like to replace the pattern #1 with the word one , but I don't want to modify th #11 . The result should be: string = "I have one file and #11 folders" I have tried: string = gsub("#1", "one, string, fixed = TRUE) but this replaces both #1 and #11. I have also tried: string = gsub("^#1$", "one, string, fixed = TRUE) but this doesn't replace anything since the pattern is part of a string that contains spaces.

Removing special characters in the beginning of a word in R

阅读更多关于 Removing special characters in the beginning of a word in R

问题 I am using the following code to remove the special characters from the begining of a word: >gsub("^[^[:alnum:]]",'','#C++') [1] "C++" But If there are multiple special characters in the beggining it removes only the first one: >gsub("^[^[:alnum:]]",'','$#C++') [1] "#C++" How can I make it to remove all the special characters in the begining so the output should be "C++" ? 回答1: We match one or more non-alpha numeric characters ( [^[:alnum:]]+ ) from the beginning of the string ( ^ ) and

Escaping Angled Bracket acts similar to look-ahead

阅读更多关于 Escaping Angled Bracket acts similar to look-ahead

问题 Why does escaping escaping the angled bracket > exhibit the look-ahead like behavior? To be clear, I understand that the angled bracket does not necessitate being escaped. The question is, how is the pattern being interpreted that it yields the match(es) shown ## match bracket, with or without underscore ## replace with "greater_" strings <- c("ten>eight", "ten_>_eight") repl <- "greater_" ## Unescaped. Yields desired results gsub(">_?", repl, strings) # [1] "tengreater_eight" "ten_greater

How do I gsub an empty “” string in R?

阅读更多关于 How do I gsub an empty “” string in R?

问题 How do I replace an empty string? This: x = c("","b") gsub("","taco",x) produces: "taco" "tacobtaco" instead of: "taco" "b" Is there any way to replace an empty string? 回答1: I would use nchar here: x[nchar(x)==0] <- "taco" EDIT If you are looking for performance so you should use nzchar: x[!nzchar(x)] <- "taco" 回答2: I wouldn’t use gsub here – semantically, I think of gsub as replacing parts within a string. For replacing a whole string, I would just use subsetting. And since you’re searching

gsub speed vs pattern length

阅读更多关于 gsub speed vs pattern length

问题 I've been using gsub extensively lately, and I noticed that short patterns run faster than long ones, which is not surprising. Here's a fully reproducible code: library(microbenchmark) set.seed(12345) n = 0 rpt = seq(20, 1461, 20) msecFF = numeric(length(rpt)) msecFT = numeric(length(rpt)) inp = rep("aaaaaaaaaa",15000) for (i in rpt) { n = n + 1 print(n) patt = paste(rep("a", rpt[n]), collapse = "") #time = microbenchmark(func(count[1:10000,12], patt, "b"), times = 10) timeFF = microbenchmark

Split string by final space in R

阅读更多关于 Split string by final space in R

问题 I have a vector a strings with a number of spaces in. I would like to split this into two vectors split by the final space. For example: vec <- c('This is one', 'And another', 'And one more again') Should become vec1 = c('This is', 'And', 'And one more again') vec2 = c('one', 'another', 'again') Is there a quick and easy way to do this? I have done similar things before using gsub and regex, and have managed to get the second vector using the following vec2 <- gsub(".* ", "", vec) But can't