R sentiment analysis with phrases in dictionaries

余生长醉 提交于 2019-11-30 16:34:05
lrnzcig

The function score.sentiment seems to work. If I try a very simple setup,

Tweets = c("this is good", "how bad it is")
neg = c("bad")
pos = c("good")
analysis=score.sentiment(Tweets, pos, neg)
table(analysis$score)

I get the expected result,

> table(analysis$score)

-1  1 
 1  1 

How are you feeding the 20 tweets to the method? From the result you're posting, that 0 20, I'd say that your problem is that your 20 tweets do not have any positive or negative word, although of course it was the case you would have noticed it. Maybe if you post more details on your list of tweets, your positive and negative words it would be easier to help you.

Anyhow, your function seems to be working just fine.

Hope it helps.

EDIT after clarifications via comments:

Actually, to solve your problem you need to tokenize your sentences into n-grams, where n would correspond to the maximum number of words you are using for your list of positive and negative n-grams. You can see how to do this e.g. in this SO question. For completeness, and since I've tested it myself, here is an example for what you could do. I simplify it to bigrams (n=2) and use the following inputs:

Tweets = c("rewarding hard work with raising taxes and VAT. #LabourManifesto", 
              "Ed Miliband is offering 'wrong choice' of 'more cuts' in #LabourManifesto")
pos = c("rewarding hard work")
neg = c("wrong choice")

You can create a bigram tokenizer like this,

library(tm)
library(RWeka)
BigramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min=2,max=2))

And test it,

> BigramTokenizer("rewarding hard work with raising taxes and VAT. #LabourManifesto")
[1] "rewarding hard"       "hard work"            "work with"           
[4] "with raising"         "raising taxes"        "taxes and"           
[7] "and VAT"              "VAT #LabourManifesto"

Then in your method you simply substitute this line,

word.list = str_split(sentence, '\\s+')

by this

word.list = BigramTokenizer(sentence)

Although of course it would be better if you changed word.list to ngram.list or something like that.

The result is, as expected,

> table(analysis$score)

-1  0 
 1  1

Just decide your n-gram size and add it to Weka_control and you should be fine.

Hope it helps.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!