Remove all punctuation except underline between characters in R with POSIX character class

点点圈 提交于 2021-02-04 19:45:48

问题


I would like to use R to remove all underlines expect those between words. At the end the code removes underlines at the end or at the beginning of a word. The result should be 'hello_world and hello_world'. I want to use those pre-built classes. Right know I have learn to expect particular characters with following code but I don't know how to use the word boundary sequences.

test<-"hello_world and _hello_world_"
gsub("[^_[:^punct:]]", "", test, perl=T)

回答1:


You can use

gsub("[^_[:^punct:]]|_+\\b|\\b_+", "", test, perl=TRUE)

See the regex demo

Details:

  • [^_[:^punct:]] - any punctuation except _
  • | - or
  • _+\b - one or more _ at the end of a word
  • | - or
  • \b_+ - one or more _ at the start of a word



回答2:


One non-regex way is to split and use trimws by setting the whitespace argument to _, i.e.

paste(sapply(strsplit(test, ' '), function(i)trimws(i, whitespace = '_')), collapse = ' ')
#[1] "hello_world and hello_world"



回答3:


We can remove all the underlying which has a word boundary on either of the end. We use positive lookahead and lookbehind regex to find such underlyings. To remove underlying at the start and end we use trimws.

test<-"hello_world and _hello_world_"
gsub("(?<=\\b)_|_(?=\\b)", "", trimws(test, whitespace = '_'), perl = TRUE)
#[1] "hello_world and hello_world"



回答4:


You could use:

test <- "hello_world and _hello_world_"
output <- gsub("(?<![^\\W])_|_(?![^\\W])", "", test, perl=TRUE)
output

[1] "hello_world and hello_world"

Explanation of regex:

(?<![^\\W])  assert that what precedes is a non word character OR the start of the input
_            match an underscore to remove
|            OR
_            match an underscore to remove, followed by
(?![^\\W])   assert that what follows is a non word character OR the end of the input


来源:https://stackoverflow.com/questions/64135363/remove-all-punctuation-except-underline-between-characters-in-r-with-posix-chara

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!