how to remove words of specific length in a string in R?

守給你的承諾、 提交于 2019-12-01 03:44:11

问题


I want to remove words of length less than 3 in a string. for example my input is

str<- c("hello RP have a nice day")

I want my output to be

str<- c("hello have nice day")

Please help


回答1:


Try this:

gsub('\\b\\w{1,2}\\b','',str)
[1] "hello  have  nice day"

EDIT \b is word boundary. If need to drop extra space,change it as:

gsub('\\b\\w{1,2}\\s','',str)

Or

gsub('(?<=\\s)(\\w{1,2}\\s)','',str,perl=T)



回答2:


Or use str_extract_all to extract all words that have length >=3 and paste

library(stringr)
paste(str_extract_all(str, '\\w{3,}')[[1]], collapse=' ')
#[1] "hello have nice day"



回答3:


x <- "hello RP have a nice day"
z <- unlist(strsplit(x, split=" "))
paste(z[nchar(z)>=3], collapse=" ")
# [1] "hello have nice day"



回答4:


Here's an approach using the rm_nchar_words function from the qdapRegex package that I coauthored with @hwnd (SO regex guru extraordinaire). Here I show removing 1-2 letter words and then 1-3 letter words:

str<- c("hello RP have a nice day")

library(qdapTools)

rm_nchar_words(str, "1,2")
## [1] "hello have nice day"

rm_nchar_words(str, "1,3")
## [1] "hello have nice"

As qdapRegex aims to teach here is the regex behind the scene where the S function puts 1,2 into the quantifier curly braces:

S("@rm_nchar_words", "1,2")
##  "(?<![\\w'])(?:'?\\w'?){1,2}(?![\\w'])"


来源:https://stackoverflow.com/questions/33226616/how-to-remove-words-of-specific-length-in-a-string-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!