how are sentiment analysis computed in blob

不问归期 提交于 2019-12-03 03:33:45
Luke

The TextBlob NaiveBayesAnalyzer is apparently based on the Stanford NLTK. The Naive Bayes algorithm in general is explained here: A simple explanation of Naive Bayes Classification

and its application to sentiment and objectivity is described here: http://nlp.stanford.edu/courses/cs224n/2009/fp/24.pdf

Basically you're right that certain words will be labeled something like "40% positive / 60% negative" based on how they were used in some body of training data (for the Stanford NLTK, the training data was movie reviews). Then the scores of all words in your sentence get multiplied to produce the sentence score.

I haven't tested, but I expect that if the library returns exactly 0.0, then your sentence didn't contain any words that had a polarity in the NLTK training set. I suspect the researchers didn't include them because 1) they were too rare in the training data or 2) they were known to be meaningless (such as "the", "a", "and", etc.).

That goes for the Naive Bayes analyzer. Regarding the PatternAnalyzer, the TextBlob docs say it's based on the "pattern" library, but it doesn't seem to document how it works. I suspect something similar is happening though.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!