问题
I am doing text mining of around 30000 tweets, Now the problem is to make results more reliable i want to convert "synonyms" to similar words for ex. some user use words "girl", some use "girls", some use "gal". similarly "give","gave" means only one thing. same for "come,"came". some user use short-form like "plz","pls" etc. Also, "stemdocument" from tm package is not working properly. it's is converting dance to danc, table to tabl.....is there any other good package for stemming. I want to replace all these words by just one similar words, in order to count the correct frequency of this data. So my sentiment analysis would be more reliable. Following is the reproducible code:
#---------------------data cleaning-----------------------------------------------------
#the data- punctuation, digits, stopwords, whitespace, and lowercase.
content2 <- as.data.frame(gsub("(said:).*?(click to expand\\.{3})", "\\1 \\2", content2$txt,fixed=TRUE),stringsAsFactors = FALSE);
content2<-as.data.frame(content2,stringsAsFactors = FALSE);
colnames(content2)<-c("txt")
docs <- Corpus(VectorSource(content2$txt));#mname<-Corpus(VectorSource(content2$name))
docs <- tm_map(docs, content_transformer(tolower));#mname<-tm_map(mname,content_transformer(tolower))
docs <- tm_map(docs, removePunctuation,preserve_intra_word_contractions=FALSE,preserve_intra_word_dashes=TRUE);#mname <- tm_map(mname, removePunctuation)
docs <- tm_map(docs, removeWords, c(stopwords("english"),"yeah","time","asked","went","want","look","call","sit",
"even","first","place","left","visit","guy","around","started","came","dont","got","took","see","take","see","come"))
docs <- tm_map(docs, stripWhitespace);#mname <- tm_map(mname, stripWhitespace)
# Text stemming-reduces words to their root form
docs <- tm_map(docs, stemDocument)
#-------------sentiment analysis--------------------------------------------------
来源:https://stackoverflow.com/questions/61257802/r-text-mining-grouping-similar-words-using-stemdocuments-in-tm-package