gsub

Ruby regex what does the \1 mean for gsub

蹲街弑〆低调 提交于 2019-12-20 09:19:03
问题 What does the \1 do? For example "foo bar bag".gsub(/(bar)/,'car\1') I believe it has something to do with how you use parentheses, but I'm not really sure. Could someone explain it to me? And can you do stuff like \2? If so, what would that do? 回答1: Each item that you surround with parenthesis in the searching part will correspond to a number \1 , \2 , etc., in the substitution part. In your example, there's only one item surrounded by parenthesis, the "(bar)" item, so anywhere you put a \1

Cleaning HTML code in R: how to clean this list?

元气小坏坏 提交于 2019-12-20 07:53:17
问题 I know that this question has been asked here tons of times but after reading a bunch of topics I'm still stucked on this :( . I've a list of scraped html nodes like this <a href="http://bit.d o/bnRinN9" target="_blank" style="color: #ff7700; font-weight: bold;">http://bit.d o/bnRinN9</a> and I just want to clean all code part. Unfortunately I'm a newbie and the only thing it comes to my mind is the Cthulhu way (regex, argh!). Which way I can do this? *I put a space between "d" and "o" in

Remove all text before first occurence of specific characeter in R

我怕爱的太早我们不能终老 提交于 2019-12-20 07:40:49
问题 Look at following vector: x <- c("MED - This means medic - somecode123", "HIV" - This means HIV -somecode456") Now I want the vector: containing the values This means medic - somecode123` This means HIV - somecode1456 I seem not able to solve this using gsub ... 回答1: We can use sub . Match the pattern of one or more non-white space ( \\S+ ) followed by one or more white space ( \\s+ ) followed by - and white space ( \\s+ ) and replace it with "" . sub('\\S+\\s+-\\s+', "", x) #[1] "This means

Reformarring complex factor vector with comma separation after thousand

折月煮酒 提交于 2019-12-20 05:34:12
问题 I would like to reformat a factor vector so the figures that it contains have a thousand separator. The vector contains integer and real number without any particular rule with respect to the values or order. Data In particular, I'm working with a vector vec similar to the one generated below: content <- c("0 - 100", "0 - 100", "0 - 100", "0 - 100", "150.22 - 170.33", "1000 - 2000","1000 - 2000", "1000 - 2000", "1000 - 2000", "7000 - 10000", "7000 - 10000", "7000 - 10000", "7000 - 10000",

Matching entire string in R

断了今生、忘了曾经 提交于 2019-12-20 05:29:09
问题 Consider the following string: string = "I have #1 file and #11 folders" I would like to replace the pattern #1 with the word one , but I don't want to modify th #11 . The result should be: string = "I have one file and #11 folders" I have tried: string = gsub("#1", "one, string, fixed = TRUE) but this replaces both #1 and #11. I have also tried: string = gsub("^#1$", "one, string, fixed = TRUE) but this doesn't replace anything since the pattern is part of a string that contains spaces.

Removing special characters in the beginning of a word in R

走远了吗. 提交于 2019-12-20 04:56:54
问题 I am using the following code to remove the special characters from the begining of a word: >gsub("^[^[:alnum:]]",'','#C++') [1] "C++" But If there are multiple special characters in the beggining it removes only the first one: >gsub("^[^[:alnum:]]",'','$#C++') [1] "#C++" How can I make it to remove all the special characters in the begining so the output should be "C++" ? 回答1: We match one or more non-alpha numeric characters ( [^[:alnum:]]+ ) from the beginning of the string ( ^ ) and

Escaping Angled Bracket acts similar to look-ahead

让人想犯罪 __ 提交于 2019-12-20 01:45:38
问题 Why does escaping escaping the angled bracket > exhibit the look-ahead like behavior? To be clear, I understand that the angled bracket does not necessitate being escaped. The question is, how is the pattern being interpreted that it yields the match(es) shown ## match bracket, with or without underscore ## replace with "greater_" strings <- c("ten>eight", "ten_>_eight") repl <- "greater_" ## Unescaped. Yields desired results gsub(">_?", repl, strings) # [1] "tengreater_eight" "ten_greater

How do I gsub an empty “” string in R?

江枫思渺然 提交于 2019-12-19 07:09:10
问题 How do I replace an empty string? This: x = c("","b") gsub("","taco",x) produces: "taco" "tacobtaco" instead of: "taco" "b" Is there any way to replace an empty string? 回答1: I would use nchar here: x[nchar(x)==0] <- "taco" EDIT If you are looking for performance so you should use nzchar: x[!nzchar(x)] <- "taco" 回答2: I wouldn’t use gsub here – semantically, I think of gsub as replacing parts within a string. For replacing a whole string, I would just use subsetting. And since you’re searching

gsub speed vs pattern length

感情迁移 提交于 2019-12-19 05:03:45
问题 I've been using gsub extensively lately, and I noticed that short patterns run faster than long ones, which is not surprising. Here's a fully reproducible code: library(microbenchmark) set.seed(12345) n = 0 rpt = seq(20, 1461, 20) msecFF = numeric(length(rpt)) msecFT = numeric(length(rpt)) inp = rep("aaaaaaaaaa",15000) for (i in rpt) { n = n + 1 print(n) patt = paste(rep("a", rpt[n]), collapse = "") #time = microbenchmark(func(count[1:10000,12], patt, "b"), times = 10) timeFF = microbenchmark

Split string by final space in R

允我心安 提交于 2019-12-19 03:57:01
问题 I have a vector a strings with a number of spaces in. I would like to split this into two vectors split by the final space. For example: vec <- c('This is one', 'And another', 'And one more again') Should become vec1 = c('This is', 'And', 'And one more again') vec2 = c('one', 'another', 'again') Is there a quick and easy way to do this? I have done similar things before using gsub and regex, and have managed to get the second vector using the following vec2 <- gsub(".* ", "", vec) But can't