gsub

Remove a list of whole words that may contain special chars from a character vector without matching parts of words

烈酒焚心 提交于 2019-12-02 14:01:31
问题 I have a list of words in R as shown below: myList <- c("at","ax","CL","OZ","Gm","Kg","C100","-1.00") And I want to remove the words which are found in the above list from the text as below: myText <- "This is at Sample ax Text, which CL is OZ better and cleaned Gm, where C100 is not equal to -1.00. This is messy text Kg." After removing the unwanted myList words, the myText should look like: This is at Sample Text, which is better and cleaned, where is not equal to. This is messy text. I was

Cleaning HTML code in R: how to clean this list?

久未见 提交于 2019-12-02 13:23:46
I know that this question has been asked here tons of times but after reading a bunch of topics I'm still stucked on this :( . I've a list of scraped html nodes like this <a href="http://bit.d o/bnRinN9" target="_blank" style="color: #ff7700; font-weight: bold;">http://bit.d o/bnRinN9</a> and I just want to clean all code part. Unfortunately I'm a newbie and the only thing it comes to my mind is the Cthulhu way (regex, argh!). Which way I can do this? *I put a space between "d" and "o" in domain name because SO doesn't allow to post that link This uses the data linked in Why R can't scrape

Remove all text before first occurence of specific characeter in R

北慕城南 提交于 2019-12-02 13:18:44
Look at following vector: x <- c("MED - This means medic - somecode123", "HIV" - This means HIV -somecode456") Now I want the vector: containing the values This means medic - somecode123` This means HIV - somecode1456 I seem not able to solve this using gsub ... We can use sub . Match the pattern of one or more non-white space ( \\S+ ) followed by one or more white space ( \\s+ ) followed by - and white space ( \\s+ ) and replace it with "" . sub('\\S+\\s+-\\s+', "", x) #[1] "This means medic - somecode123" "This means HIV -somecode456" 来源: https://stackoverflow.com/questions/36158204/remove

Extracting Date from text using R

别来无恙 提交于 2019-12-02 12:06:55
问题 My dataframe looks like df <- setNames(data.frame(c("2 June 2004, 5 words, ()(","profit, Insight, 2 May 2004, 188 words, reports, by ()("), stringsAsFactors = F), "split") What I want is to split column for date and words So far I found "Extract date text from string" lapply(df2, function(x) gsub(".*(\\d{2} \\w{3} \\d{4}).*", "\\1", x)) But its not working with my example, thanks for the help as always 回答1: As there is only a single column, we can directly use gsub/sub after extracting the

R gsub remove words in column y from words in column x

放肆的年华 提交于 2019-12-02 11:29:12
问题 I'm trying to use gsub to remove words / text in column y that are in column x. x = c("a","b","c") y = c("asometext", "some, a b text", "c a text") df = cbind(x,y) df = data.frame(df) df$y = gsub(df$x, "", df$y) If I run the code above, it removes only the text from column x row 1 and not all the rows: > df x y 1 a sometext 2 b some, b text 3 c c text I want the end result to be: > df x y 1 a sometext 2 b some, text 3 c text So all the words / letters from column x should be removed from the

Removing characters after a EURO symbol in R

拥有回忆 提交于 2019-12-02 06:31:00
问题 I have a euro symbol saved in "euro" variable: euro <- "\u20AC" euro #[1] "€" And "eurosearch" variable contains "services as defined in this SOW at a price of € 15,896.80 (if executed fro" . eurosearch [1] "services as defined in this SOW at a price of € 15,896.80 (if executed fro" I want the characters after the Euro symbol which is "15,896.80 (if executed fro" I am using this code: gsub("^.*[euro]","",eurosearch) But I'm getting empty result. How can I obtain the expected output? 回答1: You

R gsub remove words in column y from words in column x

自闭症网瘾萝莉.ら 提交于 2019-12-02 06:04:12
I'm trying to use gsub to remove words / text in column y that are in column x. x = c("a","b","c") y = c("asometext", "some, a b text", "c a text") df = cbind(x,y) df = data.frame(df) df$y = gsub(df$x, "", df$y) If I run the code above, it removes only the text from column x row 1 and not all the rows: > df x y 1 a sometext 2 b some, b text 3 c c text I want the end result to be: > df x y 1 a sometext 2 b some, text 3 c text So all the words / letters from column x should be removed from the column y. Is this possible with gsub? Normally gsub takes three arguments 1) pattern, 2) replacement

Reformarring complex factor vector with comma separation after thousand

做~自己de王妃 提交于 2019-12-02 05:15:39
I would like to reformat a factor vector so the figures that it contains have a thousand separator. The vector contains integer and real number without any particular rule with respect to the values or order. Data In particular, I'm working with a vector vec similar to the one generated below: content <- c("0 - 100", "0 - 100", "0 - 100", "0 - 100", "150.22 - 170.33", "1000 - 2000","1000 - 2000", "1000 - 2000", "1000 - 2000", "7000 - 10000", "7000 - 10000", "7000 - 10000", "7000 - 10000", "7000 - 10000", "1000000 - 22000000", "1000000 - 22000000", "1000000 - 22000000", "44000000 - 66000000

Matching entire string in R

夙愿已清 提交于 2019-12-02 03:46:01
Consider the following string: string = "I have #1 file and #11 folders" I would like to replace the pattern #1 with the word one , but I don't want to modify th #11 . The result should be: string = "I have one file and #11 folders" I have tried: string = gsub("#1", "one, string, fixed = TRUE) but this replaces both #1 and #11. I have also tried: string = gsub("^#1$", "one, string, fixed = TRUE) but this doesn't replace anything since the pattern is part of a string that contains spaces. Please note that if the initial string looked like: string = "I have #1 file blah blah blah and #11 folders

Lua frontier pattern match (whole word search)

心已入冬 提交于 2019-12-02 03:17:40
问题 can someone help me with this please: s_test = "this is a test string this is a test string " function String.Wholefind(Search_string, Word) _, F_result = string.gsub(Search_string, '%f[%a]'..Word..'%f[%A]',"") return F_result end A_test = String.Wholefind(s_test,"string") output: A_test = 2 So the frontier pattern finds the whole word no problem and gsub counts the whole words no problem but what if the search string has numbers? s_test = " 123test 123test 123" B_test = String.Wholefind(s