R text mining: grouping similar words using stemDocuments in tm package

半世苍凉 提交于 2020-04-18 06:10:15

问题


I am doing text mining of around 30000 tweets, Now the problem is to make results more reliable i want to convert "synonyms" to similar words for ex. some user use words "girl", some use "girls", some use "gal". similarly "give","gave" means only one thing. same for "come,"came". some user use short-form like "plz","pls" etc. Also, "stemdocument" from tm package is not working properly. it's is converting dance to danc, table to tabl.....is there any other good package for stemming. I want to replace all these words by just one similar words, in order to count the correct frequency of this data. So my sentiment analysis would be more reliable. Following is the reproducible code:

#---------------------data cleaning-----------------------------------------------------
#the data- punctuation, digits, stopwords, whitespace, and lowercase.
content2 <- as.data.frame(gsub("(said:).*?(click to expand\\.{3})", "\\1 \\2", content2$txt,fixed=TRUE),stringsAsFactors = FALSE);
content2<-as.data.frame(content2,stringsAsFactors = FALSE);
colnames(content2)<-c("txt")

docs <- Corpus(VectorSource(content2$txt));#mname<-Corpus(VectorSource(content2$name))
docs <- tm_map(docs, content_transformer(tolower));#mname<-tm_map(mname,content_transformer(tolower))
docs <- tm_map(docs, removePunctuation,preserve_intra_word_contractions=FALSE,preserve_intra_word_dashes=TRUE);#mname <- tm_map(mname, removePunctuation)
docs <- tm_map(docs, removeWords, c(stopwords("english"),"yeah","time","asked","went","want","look","call","sit",
                                    "even","first","place","left","visit","guy","around","started","came","dont","got","took","see","take","see","come"))
docs <- tm_map(docs, stripWhitespace);#mname <- tm_map(mname, stripWhitespace)

# Text stemming-reduces words to their root form
docs <- tm_map(docs, stemDocument)
#-------------sentiment analysis--------------------------------------------------

来源:https://stackoverflow.com/questions/61257802/r-text-mining-grouping-similar-words-using-stemdocuments-in-tm-package

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!