Removing characters after a EURO symbol in R

拥有回忆 提交于 2019-12-02 06:31:00

问题


I have a euro symbol saved in "euro" variable:

euro <- "\u20AC"
euro
#[1] "€"

And "eurosearch" variable contains "services as defined in this SOW at a price of € 15,896.80 (if executed fro" .

eurosearch
[1] "services as defined in this SOW at a price of € 15,896.80 (if executed fro"

I want the characters after the Euro symbol which is "15,896.80 (if executed fro" I am using this code:

gsub("^.*[euro]","",eurosearch)

But I'm getting empty result. How can I obtain the expected output?


回答1:


You can use variables in the pattern by just concatenating strings using paste0:

euro <- "€"
eurosearch <- "services as defined in this SOW at a price of € 15,896.80 (if executed fro"
sub(paste0("^.*", gsub("([^A-Za-z_0-9])", "\\\\\\1", euro), "\\s*(\\S+).*"), "\\1", eurosearch)

euro <- "$"
eurosearch <- "services as defined in this SOW at a price of $ 25,196.4 (if executed fro"
sub(paste0("^.*", gsub("([^A-Za-z_0-9])", "\\\\\\1", euro), "\\s*(\\S+).*"), "\\1", eurosearch)

See CodingGround demo

Note that with gsub("([^A-Za-z_0-9])", "\\\\\\1", euro) I am escaping any non-word symbols so that $ could be treated as a literal, not a special regex metacharacter (taken from this SO post).




回答2:


Use regmatches present in base r or str_extarct in stringr, etc

> x <- "services as defined in this SOW at a price of € 15,896.80 (if executed fro"
> regmatches(x, regexpr("(?<=€ )\\S+", x, perl=T))
[1] "15,896.80"

or

> gsub("€ (\\S+)|.", "\\1", x)
[1] "15,896.80"

or

Using variables.

euro <- "\u20AC"
gsub(paste(euro , "(\\S+)|."), "\\1", x) 

If this answer of using variables won't work for you then you need to set the encoding,

gsub(paste(euro , "(\\S+)|."), "\\1", `Encoding<-`(x, "UTF8"))

Source



来源:https://stackoverflow.com/questions/31288513/removing-characters-after-a-euro-symbol-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!