gsub | 易学教程

Remove a list of whole words that may contain special chars from a character vector without matching parts of words

阅读更多关于 Remove a list of whole words that may contain special chars from a character vector without matching parts of words

问题 I have a list of words in R as shown below: myList <- c("at","ax","CL","OZ","Gm","Kg","C100","-1.00") And I want to remove the words which are found in the above list from the text as below: myText <- "This is at Sample ax Text, which CL is OZ better and cleaned Gm, where C100 is not equal to -1.00. This is messy text Kg." After removing the unwanted myList words, the myText should look like: This is at Sample Text, which is better and cleaned, where is not equal to. This is messy text. I was

Cleaning HTML code in R: how to clean this list?

阅读更多关于 Cleaning HTML code in R: how to clean this list?

I know that this question has been asked here tons of times but after reading a bunch of topics I'm still stucked on this :( . I've a list of scraped html nodes like this <a href="http://bit.d o/bnRinN9" target="_blank" style="color: #ff7700; font-weight: bold;">http://bit.d o/bnRinN9</a> and I just want to clean all code part. Unfortunately I'm a newbie and the only thing it comes to my mind is the Cthulhu way (regex, argh!). Which way I can do this? *I put a space between "d" and "o" in domain name because SO doesn't allow to post that link This uses the data linked in Why R can't scrape

Remove all text before first occurence of specific characeter in R

阅读更多关于 Remove all text before first occurence of specific characeter in R

Look at following vector: x <- c("MED - This means medic - somecode123", "HIV" - This means HIV -somecode456") Now I want the vector: containing the values This means medic - somecode123` This means HIV - somecode1456 I seem not able to solve this using gsub ... We can use sub . Match the pattern of one or more non-white space ( \\S+ ) followed by one or more white space ( \\s+ ) followed by - and white space ( \\s+ ) and replace it with "" . sub('\\S+\\s+-\\s+', "", x) #[1] "This means medic - somecode123" "This means HIV -somecode456" 来源： https://stackoverflow.com/questions/36158204/remove

Extracting Date from text using R

阅读更多关于 Extracting Date from text using R

问题 My dataframe looks like df <- setNames(data.frame(c("2 June 2004, 5 words, ()(","profit, Insight, 2 May 2004, 188 words, reports, by ()("), stringsAsFactors = F), "split") What I want is to split column for date and words So far I found "Extract date text from string" lapply(df2, function(x) gsub(".*(\\d{2} \\w{3} \\d{4}).*", "\\1", x)) But its not working with my example, thanks for the help as always 回答1: As there is only a single column, we can directly use gsub/sub after extracting the

R gsub remove words in column y from words in column x

阅读更多关于 R gsub remove words in column y from words in column x

问题 I'm trying to use gsub to remove words / text in column y that are in column x. x = c("a","b","c") y = c("asometext", "some, a b text", "c a text") df = cbind(x,y) df = data.frame(df) df$y = gsub(df$x, "", df$y) If I run the code above, it removes only the text from column x row 1 and not all the rows: > df x y 1 a sometext 2 b some, b text 3 c c text I want the end result to be: > df x y 1 a sometext 2 b some, text 3 c text So all the words / letters from column x should be removed from the

Removing characters after a EURO symbol in R

阅读更多关于 Removing characters after a EURO symbol in R

问题 I have a euro symbol saved in "euro" variable: euro <- "\u20AC" euro #[1] "€" And "eurosearch" variable contains "services as defined in this SOW at a price of € 15,896.80 (if executed fro" . eurosearch [1] "services as defined in this SOW at a price of € 15,896.80 (if executed fro" I want the characters after the Euro symbol which is "15,896.80 (if executed fro" I am using this code: gsub("^.*[euro]","",eurosearch) But I'm getting empty result. How can I obtain the expected output? 回答1: You

R gsub remove words in column y from words in column x

阅读更多关于 R gsub remove words in column y from words in column x

I'm trying to use gsub to remove words / text in column y that are in column x. x = c("a","b","c") y = c("asometext", "some, a b text", "c a text") df = cbind(x,y) df = data.frame(df) df$y = gsub(df$x, "", df$y) If I run the code above, it removes only the text from column x row 1 and not all the rows: > df x y 1 a sometext 2 b some, b text 3 c c text I want the end result to be: > df x y 1 a sometext 2 b some, text 3 c text So all the words / letters from column x should be removed from the column y. Is this possible with gsub? Normally gsub takes three arguments 1) pattern, 2) replacement

Reformarring complex factor vector with comma separation after thousand

阅读更多关于 Reformarring complex factor vector with comma separation after thousand

I would like to reformat a factor vector so the figures that it contains have a thousand separator. The vector contains integer and real number without any particular rule with respect to the values or order. Data In particular, I'm working with a vector vec similar to the one generated below: content <- c("0 - 100", "0 - 100", "0 - 100", "0 - 100", "150.22 - 170.33", "1000 - 2000","1000 - 2000", "1000 - 2000", "1000 - 2000", "7000 - 10000", "7000 - 10000", "7000 - 10000", "7000 - 10000", "7000 - 10000", "1000000 - 22000000", "1000000 - 22000000", "1000000 - 22000000", "44000000 - 66000000

Matching entire string in R

阅读更多关于 Matching entire string in R

Consider the following string: string = "I have #1 file and #11 folders" I would like to replace the pattern #1 with the word one , but I don't want to modify th #11 . The result should be: string = "I have one file and #11 folders" I have tried: string = gsub("#1", "one, string, fixed = TRUE) but this replaces both #1 and #11. I have also tried: string = gsub("^#1$", "one, string, fixed = TRUE) but this doesn't replace anything since the pattern is part of a string that contains spaces. Please note that if the initial string looked like: string = "I have #1 file blah blah blah and #11 folders

Lua frontier pattern match (whole word search)

阅读更多关于 Lua frontier pattern match (whole word search)

问题 can someone help me with this please: s_test = "this is a test string this is a test string " function String.Wholefind(Search_string, Word) _, F_result = string.gsub(Search_string, '%f[%a]'..Word..'%f[%A]',"") return F_result end A_test = String.Wholefind(s_test,"string") output: A_test = 2 So the frontier pattern finds the whole word no problem and gsub counts the whole words no problem but what if the search string has numbers? s_test = " 123test 123test 123" B_test = String.Wholefind(s