sentiment-analysis | 易学教程

Problem with CountVectorizer from scikit-learn package

阅读更多关于 Problem with CountVectorizer from scikit-learn package

问题 I have a dataset of movie reviews. It has two columns: 'class' and 'reviews' . I have done most of the routine preprocessing stuff, such as: lowering the characters, removing stop words, removing punctuation marks. At the end of preprocessing, each original review looks like words separated by space delimiter. I want to use CountVectorizer and then TF-IDF in order to create features of my dataset so i can do classification/text recognition with Random Forest. I looked into websites and i

NLTK sentiment analysis is only returning one value

阅读更多关于 NLTK sentiment analysis is only returning one value

问题 I seriously hate to post a question about an entire chunk of code, but I've been working on this for the past 3 hours and I can't wrap my head around what is happening. I have approximately 600 tweets I am retrieving from a CSV file with varying score values (between -2 to 2) reflecting the sentiment towards a presidential candidate. However, when I run this training sample on any other data, only one value is returned (positive). I have checked to see if the scores were being added correctly

Can the ANEW dictionary be used for sentiment analysis in quanteda?

阅读更多关于 Can the ANEW dictionary be used for sentiment analysis in quanteda?

问题 I am trying to find a way to implement the Affective Norms for English Words (in dutch) for a longitudinal sentiment analysis with Quanteda. What I ultimately want to have is a "mean sentiment" per year in order to show any longitudinal trends. In the data-set all words a scored on a 7-point Likert-scale by 64 coders on four categories, which provides a mean for each word. What I want to do is take one of the dimensions and use this to analyse changes in emotions over time. I realise that

How much text can Weka handle?

阅读更多关于 How much text can Weka handle?

问题 I have a sentiment analysis task and I need to specify how much data (in my case text) weka can handle. I have a corpus of 2500 opinions already tagged. I know that it´s a small corpus but my thesis advisor is asking me to specifically argue on how much data can Weka handle. 回答1: Your limitation with Weka will be on whatever learning algorithm you use and how much memory you have available for training. Most classifiers require the whole set be loaded into memory for training, but there are

syuzhet package - extracting words evaluated by sentiment score

阅读更多关于 syuzhet package - extracting words evaluated by sentiment score

问题 I'm using syuzhet package for sentiment analysis. It is very simple for usage but I cannot find method/function where it could return all the evaluated words from sentence. It is returning only data frame with count of values correspond to eg. anger, anticipation, surprise, ..., negative, positive. But how can I get back particular words which are considered e.g. as a positive or negative... text <- c("I love it. It's awesome!", 'Im positively surprised.', 'very bad alrighty then.',

How to generate sentiment treebank in Stanford NLP

阅读更多关于 How to generate sentiment treebank in Stanford NLP

问题 I'm using Sentiment Stanford NLP library for sentiment analytics. Now I want to generate a treebank from a sentence input sentence: "Effective but too-tepid biopic" output tree bank: (2 (3 (3 Effective) (2 but)) (1 (1 too-tepid) (2 biopic))) Can anybody show me how to do it ? Thank all. 回答1: So I had to push a bug fix for the SentimentPipeline. If you get the latest code from GitHub and use that version: https://github.com/stanfordnlp/CoreNLP you can issue this command: java -Xmx8g edu

In general, when does TF-IDF reduce accuracy?

阅读更多关于 In general, when does TF-IDF reduce accuracy?

问题 I'm training a corpus consisting of 200000 reviews into positive and negative reviews using a Naive Bayes model, and I noticed that performing TF-IDF actually reduced the accuracy (while testing on test set of 50000 reviews) by about 2%. So I was wondering if TF-IDF has any underlying assumptions on the data or model that it works with, i.e. any cases where accuracy is reduced by the use of it? 回答1: The IDF component of TF*IDF can harm your classification accuracy in some cases. Let suppose

naiveBayes and predict function not working in R

阅读更多关于 naiveBayes and predict function not working in R

问题 I am doing a sentiment analysis on twitter comments (in Kazakh language) using below R script. 3000 (1500sad, 1500happy) comments for the training set and 1000 (happy sad mixed) comments for the test set. Everything works great but at the end, the predicted values are showing all happy, which is not right. I have checked every function and all are working up until the naiveBayes function. I checked classifier values and they are correct. I think either naiveBayes or predict is messing things

R - twitteR package download of package ‘rjson’ failed

阅读更多关于 R - twitteR package download of package ‘rjson’ failed

问题 I am trying my hand at some data mining and attempting to retrieve data from Twitter. When I tried installing the package 'twitteR', I get the following warning: Warning in install.packages : download of package ‘rjson’ failed But it loads the rest of the packages. Then when I try to call the library: > library(twitteR) Loading required package: ROAuth Loading required package: RCurl Loading required package: bitops Attaching package: ‘RCurl’ The following object is masked from ‘package:tm

error in r code sentiment analysis

阅读更多关于 error in r code sentiment analysis

问题 I am trying to write a code in r to do sentiment analysis by exporting and analyzing tweets,the following code is supposed to clean the tweet call up the sentiment package do the scoring and return back the result , this code has been cited in many tech blogs the code is as follows : score.sentiment = function(sentences , pos.words, neg.words , progress='none') { require(plyr) require(stringr) scores = laply(sentences,function(sentence,pos.words,neg.words) { sentence =gsub('[[:punct:]]',''