Good dataset for sentiment analysis? [closed]

有些话、适合烂在心里 提交于 2019-12-31 08:07:54

问题


I am working on sentiment analysis and I am using dataset given in this link: http://www.cs.jhu.edu/~mdredze/datasets/sentiment/index2.html and I have divided my dataset into 50:50 ratio. 50% are used as test samples and 50% are used as train samples and the features extracted from train samples and perform classification using Weka classifier, but my predication accuracy is about 70-75%.

Can anybody suggest some other datasets which can help me to increase the result - I have used unigram, bigram and POStags as my features.


回答1:


There are many sources to get sentiment analysis dataset:

  • huge ngrams dataset from google storage.googleapis.com/books/ngrams/books/datasetsv2.html
  • http://www.sananalytics.com/lab/twitter-sentiment/
  • http://inclass.kaggle.com/c/si650winter11/data
  • http://nlp.stanford.edu/sentiment/treebank.html
  • or you can look into this global ML dataset repository: https://archive.ics.uci.edu/ml

Anyway, it does not mean it will help you to get a better accuracy for your current dataset because the corpus might be very different from your dataset. Apart from reducing the testing percentage vs training, you could: test other classifiers or fine tune all hyperparameters using semi-automated wrapper like CVParameterSelection or GridSearch, or even auto-weka if it fits.

It is quite rare to use 50/50, 80/20 is quite a commonly occurring ratio. A better practice is to use: 60% for training, 20% for cross validation, 20% for testing.




回答2:


I started to gather sentiment analysis tools/datasets/lexicons in one place, it could be useful for you too: https://github.com/laugustyniak/awesome-sentiment-analysis

Start PR if you want to add something more or just write to me. I worked a lot with Amazon data [millions of reviews].




回答3:


Here is a list of datasets that give the sentiments for individual words.. http://positivewordsresearch.com/sentiment-analysis-resources/



来源:https://stackoverflow.com/questions/24605702/good-dataset-for-sentiment-analysis

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!