sentiment-analysis

Identifying the entity in sentiment analysis using Lingpipe

人盡茶涼 提交于 2019-12-06 07:28:07
I have implemented sentiment analysis using the sentiment analysis module of Lingpipe. I know that they use a Dynamic LR model for this. It just tells me if the test string is a positive sentiment or negative sentiment. What ideas could I use to determine the object for which the sentiment has been expressed? If the text is categorized as positive sentiment, I would like to get the object for which the sentiment has been expressed - this could be a movie name, product name or others. Although this question is really old but I would like to answer it for others' benefit. What you want here is

Negation handling in sentiment analysis

ⅰ亾dé卋堺 提交于 2019-12-06 00:36:45
问题 I am in need of a little help here, I need to identify the negative words like "not good","not bad" and then identify the polarity (negative or positive) of the sentiment. I did everything except handling the negations. I just want to know how I can include negations into it. How do I go about it? 回答1: Negation handling is quite a broad field, with numerous different potential implementations. Here I can provide sample code that negates a sequence of text and stores negated uni/bi/trigrams in

DocumentTermMatrix fails with a strange error only when # terms > 3000

岁酱吖の 提交于 2019-12-06 00:32:34
问题 My code below works fine unless I use create a DocumentTermMatrix with more that 3000 terms. This line: movie_dict <- findFreqTerms(movie_dtm_train, 8) movie_dtm_hiFq_train <- DocumentTermMatrix(movie_corpus_train, list(dictionary = movie_dict)) Fails with: Error in simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allTerms), : 'i, j, v' different lengths In addition: Warning messages: 1: In mclapply(unname(content(x)), termFreq, control) : all scheduled cores encountered

Estimating document polarity using R's qdap package without sentSplit

社会主义新天地 提交于 2019-12-05 12:33:40
I'd like to apply qdap 's polarity function to a vector of documents, each of which could contain multiple sentences, and obtain the corresponding polarity for each document. For example: library(qdap) polarity(DATA$state)$all$polarity # Results: [1] -0.8165 -0.4082 0.0000 -0.8944 0.0000 0.0000 0.0000 -0.5774 0.0000 [10] 0.4082 0.0000 Warning message: In polarity(DATA$state) : Some rows contain double punctuation. Suggested use of `sentSplit` function. This warning can't be ignored, as it seems to add the polarity scores of each sentence in the document. This can result in document-level

NLTK convert tokenized sentence to synset format

試著忘記壹切 提交于 2019-12-04 23:04:24
问题 I'm looking to get the similarity between a single word and each word in a sentence using NLTK. NLTK can get the similarity between two specific words as shown below. This method requires that a specific reference to the word is given, in this case it is 'dog.n.01' where dog is a noun and we want to use the first (01) NLTK definition. dog = wordnet.synset('dog.n.01') cat = wordnet.synset('cat.n.01') print dog.path_similarity(cat) >> 0.2 The problem is that I need to get the part of speech

Python NLTK not sentiment calculate correct

杀马特。学长 韩版系。学妹 提交于 2019-12-04 19:19:48
I do have some positive and negative sentence. I want very simple to use Python NLTK to train a NaiveBayesClassifier for investigate sentiment for other sentence. I try to use this code, but my result is always positive. http://www.sjwhitworth.com/sentiment-analysis-in-python-using-nltk/ I am very new at python so there my be a mistake in the code when i copy it. import nltk import math import re import sys import os import codecs reload(sys) sys.setdefaultencoding('utf-8') from nltk.corpus import stopwords __location__ = os.path.realpath( os.path.join(os.getcwd(), os.path.dirname(__file__)))

word analysis and scoring from a file python

我只是一个虾纸丫 提交于 2019-12-04 13:48:56
问题 I am doing a word by word analysis of a sentence such as "Hey there!! This is a excellent movie???" I have many sentences like above. I have a huge dataset file like shown below where I have to do a quick lookup if that word exists. If it does then do analysis and store in a dictionary such as get the score from the file of the word, score of last word of sentence, first word of sentence and so on. sentence[i] => Hey there!! This is a excellent movie??? sentence[0] = Hey, sentence[1]=there!!

How to implement TF_IDF feature weighting with Naive Bayes

随声附和 提交于 2019-12-04 13:00:20
I'm trying to implement the naive Bayes classifier for sentiment analysis. I plan to use the TF-IDF weighting measure. I'm just a little stuck now. NB generally uses the word(feature) frequency to find the maximum likelihood. So how do I introduce the TF-IDF weighting measure in naive Bayes? You can visit the following blog shows in detail how do you calculate TFIDF. You use the TF-IDF weights as features/predictors in your statistical model. I suggest to use either gensim [1]or scikit-learn [2] to compute the weights, which you then pass to your Naive Bayes fitting procedure. The scikit-learn

Sentiment Analysis on LARGE collection of online conversation text

南楼画角 提交于 2019-12-04 08:34:21
问题 The title says it all; I have an SQL database bursting at the seams with online conversation text. I've already done most of this project in Python, so I would like to do this using Python's NLTK library (unless there's a strong reason not to). The data is organized by Thread , Username , and Post . Each thread more or less focuses on discussing one "product" of the Category that I am interested in analyzing. Ultimately, when this is finished, I would like to have an estimated opinion (like

Can the ANEW dictionary be used for sentiment analysis in quanteda?

六眼飞鱼酱① 提交于 2019-12-03 22:07:19
I am trying to find a way to implement the Affective Norms for English Words (in dutch) for a longitudinal sentiment analysis with Quanteda. What I ultimately want to have is a "mean sentiment" per year in order to show any longitudinal trends. In the data-set all words a scored on a 7-point Likert-scale by 64 coders on four categories, which provides a mean for each word. What I want to do is take one of the dimensions and use this to analyse changes in emotions over time. I realise that Quanteda has a function for implementing the LIWC-dictionary, but I would prefer using the open-source