问题
I want to replace all words containing the symbol @ with a specific word. I am used gsub and therefore am applying it to a character vector. The issue that keeps occuring is that when I use:
gsub(".*@.*", "email", data)
all of the text in that portion of the character vector gets deleted.
There are multiple different emails all with different lengths so I can't set the characters prior and characters after to a specific number.
Any suggestions?
I've done my fair share of reading about regex but everything I tried failed.
Here's an example:
data <- c("This is an example. Here is my email: emailaddress@help.com. Thank you")
data <- gsub(".*@.*", "email", data)
it returns [1] "email"
when I want [1] "This is an example. Here is my email: email. Thank you"
回答1:
You can use the following..
gsub('\\S+@\\S+', 'email', data)
Explanation:
\S
matches any non-whitespace character. So here we match for any non-whitespace character (1
or more times) preceded by @
followed by any non-whitespace character (1
or more times)
回答2:
To replace strings with an embedded "@" in R, you can use (translaiting @Fabricator's pattern to R)
data <- c("This is an example. Here is my email: emailaddress@help.com")
gsub("[^\\s]*@[^\\s]*", "email", data, perl=T)
data
# [1] "This is an example. Here is my email: email"
来源:https://stackoverflow.com/questions/24395382/r-code-removing-words-containing