sentiment-analysis | 易学教程

Identifying the entity in sentiment analysis using Lingpipe

阅读更多关于 Identifying the entity in sentiment analysis using Lingpipe

I have implemented sentiment analysis using the sentiment analysis module of Lingpipe. I know that they use a Dynamic LR model for this. It just tells me if the test string is a positive sentiment or negative sentiment. What ideas could I use to determine the object for which the sentiment has been expressed? If the text is categorized as positive sentiment, I would like to get the object for which the sentiment has been expressed - this could be a movie name, product name or others. Although this question is really old but I would like to answer it for others' benefit. What you want here is

Negation handling in sentiment analysis

阅读更多关于 Negation handling in sentiment analysis

问题 I am in need of a little help here, I need to identify the negative words like "not good","not bad" and then identify the polarity (negative or positive) of the sentiment. I did everything except handling the negations. I just want to know how I can include negations into it. How do I go about it? 回答1: Negation handling is quite a broad field, with numerous different potential implementations. Here I can provide sample code that negates a sequence of text and stores negated uni/bi/trigrams in

DocumentTermMatrix fails with a strange error only when # terms > 3000

阅读更多关于 DocumentTermMatrix fails with a strange error only when # terms > 3000

问题 My code below works fine unless I use create a DocumentTermMatrix with more that 3000 terms. This line: movie_dict <- findFreqTerms(movie_dtm_train, 8) movie_dtm_hiFq_train <- DocumentTermMatrix(movie_corpus_train, list(dictionary = movie_dict)) Fails with: Error in simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allTerms), : 'i, j, v' different lengths In addition: Warning messages: 1: In mclapply(unname(content(x)), termFreq, control) : all scheduled cores encountered

Estimating document polarity using R's qdap package without sentSplit

阅读更多关于 Estimating document polarity using R's qdap package without sentSplit

I'd like to apply qdap 's polarity function to a vector of documents, each of which could contain multiple sentences, and obtain the corresponding polarity for each document. For example: library(qdap) polarity(DATA$state)$all$polarity # Results: [1] -0.8165 -0.4082 0.0000 -0.8944 0.0000 0.0000 0.0000 -0.5774 0.0000 [10] 0.4082 0.0000 Warning message: In polarity(DATA$state) : Some rows contain double punctuation. Suggested use of `sentSplit` function. This warning can't be ignored, as it seems to add the polarity scores of each sentence in the document. This can result in document-level

NLTK convert tokenized sentence to synset format

阅读更多关于 NLTK convert tokenized sentence to synset format

问题 I'm looking to get the similarity between a single word and each word in a sentence using NLTK. NLTK can get the similarity between two specific words as shown below. This method requires that a specific reference to the word is given, in this case it is 'dog.n.01' where dog is a noun and we want to use the first (01) NLTK definition. dog = wordnet.synset('dog.n.01') cat = wordnet.synset('cat.n.01') print dog.path_similarity(cat) >> 0.2 The problem is that I need to get the part of speech

Python NLTK not sentiment calculate correct

阅读更多关于 Python NLTK not sentiment calculate correct

I do have some positive and negative sentence. I want very simple to use Python NLTK to train a NaiveBayesClassifier for investigate sentiment for other sentence. I try to use this code, but my result is always positive. http://www.sjwhitworth.com/sentiment-analysis-in-python-using-nltk/ I am very new at python so there my be a mistake in the code when i copy it. import nltk import math import re import sys import os import codecs reload(sys) sys.setdefaultencoding('utf-8') from nltk.corpus import stopwords __location__ = os.path.realpath( os.path.join(os.getcwd(), os.path.dirname(__file__)))

word analysis and scoring from a file python

阅读更多关于 word analysis and scoring from a file python

问题 I am doing a word by word analysis of a sentence such as "Hey there!! This is a excellent movie???" I have many sentences like above. I have a huge dataset file like shown below where I have to do a quick lookup if that word exists. If it does then do analysis and store in a dictionary such as get the score from the file of the word, score of last word of sentence, first word of sentence and so on. sentence[i] => Hey there!! This is a excellent movie??? sentence[0] = Hey, sentence[1]=there!!

How to implement TF_IDF feature weighting with Naive Bayes

阅读更多关于 How to implement TF_IDF feature weighting with Naive Bayes

I'm trying to implement the naive Bayes classifier for sentiment analysis. I plan to use the TF-IDF weighting measure. I'm just a little stuck now. NB generally uses the word(feature) frequency to find the maximum likelihood. So how do I introduce the TF-IDF weighting measure in naive Bayes? You can visit the following blog shows in detail how do you calculate TFIDF. You use the TF-IDF weights as features/predictors in your statistical model. I suggest to use either gensim [1]or scikit-learn [2] to compute the weights, which you then pass to your Naive Bayes fitting procedure. The scikit-learn

Sentiment Analysis on LARGE collection of online conversation text

阅读更多关于 Sentiment Analysis on LARGE collection of online conversation text

问题 The title says it all; I have an SQL database bursting at the seams with online conversation text. I've already done most of this project in Python, so I would like to do this using Python's NLTK library (unless there's a strong reason not to). The data is organized by Thread , Username , and Post . Each thread more or less focuses on discussing one "product" of the Category that I am interested in analyzing. Ultimately, when this is finished, I would like to have an estimated opinion (like

Can the ANEW dictionary be used for sentiment analysis in quanteda?

阅读更多关于 Can the ANEW dictionary be used for sentiment analysis in quanteda?

I am trying to find a way to implement the Affective Norms for English Words (in dutch) for a longitudinal sentiment analysis with Quanteda. What I ultimately want to have is a "mean sentiment" per year in order to show any longitudinal trends. In the data-set all words a scored on a 7-point Likert-scale by 64 coders on four categories, which provides a mean for each word. What I want to do is take one of the dimensions and use this to analyse changes in emotions over time. I realise that Quanteda has a function for implementing the LIWC-dictionary, but I would prefer using the open-source