R: gsub with fixed=T or F and special cases

核能气质少年 提交于 2019-12-07 11:32:24

Using parts from your previous question to test I think we can put a place holder in front of punctuation as follows, without slowing it down too much:

line <- c("one", "two one", "four phones", "and a capsule", "But here's a caps key",
    "Here is the capsule, caps key, and two caps, or two caps. or even three caps-" )
e <- c("one", "two", "caps")
r <- c("ONE", "TWO", "cap")


line <- rep(line, 1700000/length(line))

line <- gsub("([[:punct:]])", " <DEL>\\1<DEL> ", line, perl=TRUE)

## Start    
line2 <- paste0(" ", line, " ")
e2 <-  paste0(" ", e, " ")
r2 <- paste0(" ", r, " ")


for (i in seq_along(e2)) {
    line2 <- gsub(e2[i], r2[i], line2, fixed=TRUE)
}

gsub("^\\s|\\s$| <DEL>|<DEL> ", "", line2, perl=TRUE)
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!