How to remove specific special characters in R

爷,独闯天下 提交于 2019-11-27 16:01:26

问题


I have some sentences like this one.

c = "In Acid-base reaction (page[4]), why does it create water and not H+?" 

I want to remove all special characters except for '?&+-/

I know that if I want to remove all special characters, I can simply use

gsub("[[:punct:]]", "", c)
"In Acidbase reaction page4 why does it create water and not H"

However, some special characters such as + - ? are also removed, which I intend to keep.

I tried to create a string of special characters that I can use in some code like this

gsub("[special_string]", "", c)

The best I can do is to come up with this

cat("!\"#$%()*,.:;<=>@[\\]^_`{|}~.")

However, the following code just won't work

gsub("[cat("!\"#$%()*,.:;<=>@[\\]^_`{|}~.")]", "", c)

What should I do to remove special characters, except for a few that I want to keep?

Thanks


回答1:


gsub("[^[:alnum:][:blank:]+?&/\\-]", "", c)
# [1] "In Acid-base reaction page4 why does it create water and not H+?"



回答2:


I think you're after a regex solution. I'll give you a messy solution and a package add on solution (shameless self promotion).

There's likely a better regex:

x <- "In Acid-base reaction (page[4]), why does it create water and not H+?" 
keeps <- c("+", "-", "?")

## Regex solution
gsub(paste0(".*?($|'|", paste(paste0("\\", 
    keeps), collapse = "|"), "|[^[:punct:]]).*?"), "\\1", x)

#qdap: addon package solution
library(qdap)
strip(x, keeps, lower = FALSE)

## [1] "In Acid-base reaction page why does it create water and not H+?"



回答3:


In order to get your method to work, you need to put the literal "]" immediately after the leading "["

 gsub("[][!#$%()*,.:;<=>@^_`|~.{}]", "", c)
[1] "In Acid-base reaction page4 why does it create water and not H+?"

You can them put the inner "[" anywhere. If you needed to exclude minus, it would then need to be last. See the ?regex page after all of those special pre-defined character classes are listed.



来源:https://stackoverflow.com/questions/21641522/how-to-remove-specific-special-characters-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!