Remove first n words and take count

拥有回忆 提交于 2021-01-20 13:49:30

问题


I have a dataframe with text column, I need to ignore or eliminate first 2 words and take count of string in that column.

 b <- data.frame(text = c("hello sunitha what can I do for you?",
                          "hi john what can I do for you?")

Expected output in dataframe 'b': how can we remove first 2 words, so that count of 'what can I do for you? = 2


回答1:


You can use gsub to remove the first two words and then tapply and count, i.e.

i1 <- gsub("^\\w*\\s*\\w*\\s*", "", b$text)
tapply(i1, i1, length)
#what can I do for you? 
#                     2

If you need to remove any range of words, we can amend i1 as follows,

i1 <- sapply(strsplit(as.character(b$text), ' '), function(i)paste(i[-c(2:4)], collapse = ' '))
tapply(i1, i1, length)
#hello I do for you?    hi I do for you? 
#                  1                   1 



回答2:


 b=data.frame(text=c("hello sunitha what can I do for you?","hi john what can I do for you?"),stringsAsFactors = FALSE)
b$processed = sapply(b$text, function(x) (strsplit(x," ")[[1]]%>%.[-c(1:2)])%>%paste0(.,collapse=" "))
b$count = sapply(b$processed, function(x) length(strsplit(x," ")[[1]]))
> b
                                  text              processed count
1 hello sunitha what can I do for you? what can I do for you?     6
2       hi john what can I do for you? what can I do for you?     6

Are you looking for something like this? watch out for stringsAsFactors = FALSE else your texts will be factor type and harder to work on.



来源:https://stackoverflow.com/questions/55412225/remove-first-n-words-and-take-count

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!