How to remove specific special characters in R

北战南征 提交于 2019-11-29 01:31:23
gsub("[^[:alnum:][:blank:]+?&/\\-]", "", c)
# [1] "In Acid-base reaction page4 why does it create water and not H+?"

I think you're after a regex solution. I'll give you a messy solution and a package add on solution (shameless self promotion).

There's likely a better regex:

x <- "In Acid-base reaction (page[4]), why does it create water and not H+?" 
keeps <- c("+", "-", "?")

## Regex solution
gsub(paste0(".*?($|'|", paste(paste0("\\", 
    keeps), collapse = "|"), "|[^[:punct:]]).*?"), "\\1", x)

#qdap: addon package solution
library(qdap)
strip(x, keeps, lower = FALSE)

## [1] "In Acid-base reaction page why does it create water and not H+?"

In order to get your method to work, you need to put the literal "]" immediately after the leading "["

 gsub("[][!#$%()*,.:;<=>@^_`|~.{}]", "", c)
[1] "In Acid-base reaction page4 why does it create water and not H+?"

You can them put the inner "[" anywhere. If you needed to exclude minus, it would then need to be last. See the ?regex page after all of those special pre-defined character classes are listed.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!