sentiment-analysis

requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read, 512 more expected)', IncompleteRead

醉酒当歌 提交于 2019-12-24 14:16:21
问题 I wanted to write a program to fetch tweets from Twitter and then do sentiment analysis. I wrote the following code and got the error even after importing all the necessary libraries. I'm relatively new to data science, so please help me. I could not understand the reason for this error: class TwitterClient(object): def __init__(self): # keys and tokens from the Twitter Dev Console consumer_key = 'XXXXXXXXX' consumer_secret = 'XXXXXXXXX' access_token = 'XXXXXXXXX' access_token_secret =

remove emoticons in R using tm package

元气小坏坏 提交于 2019-12-24 11:55:26
问题 I'm using the tm package to clean up a Twitter Corpus. However, the package is unable to clean up emoticons. Here's a replicated code: July4th_clean <- tm_map(July4th_clean, content_transformer(tolower)) Error in FUN(content(x), ...) : invalid input 'RT ElleJohnson Love of country is encircling the globes ������������������ july4thweekend July4th FourthOfJuly IndependenceDay NotAvailableOnIn' in 'utf8towcs' Can someone point me in the right direction to remove the emoticons using the tm

How to tackle twitter sentiment analysis?

蹲街弑〆低调 提交于 2019-12-24 01:07:04
问题 I'd like you to give me some advice in order to tackle this problem. At college I've been solving opinion mining tasks but with Twitter the approach is quite different. For example, I used an ensemble learning approach to classify users opinions about a certain Hotel in Spain. Of course, I was given a training set with positive and negative opinions and then I tested with the test set. But now, with twitter, I've found this kind of categorization very difficult. Do I need to have a training

Memory Error, performing sentiment analysis large size data

不打扰是莪最后的温柔 提交于 2019-12-23 05:19:26
问题 I am trying to perform sentiment analysis on the large set of data from social network. The part of the code works great with small size of data. The input size less than 20mb has no problem computing. But if the size is more than 20mb I am getting memory error. Environment: Windows 10, anaconda 3.x with updated version packages. Code: def captionsenti(F_name): print ("reading from csv file") F1_name="caption_senti.csv" df=pd.read_csv(path+F_name+".csv") filename=path+F_name+"_"+F1_name df1

how to make RandomForestClassifier faster?

有些话、适合烂在心里 提交于 2019-12-23 05:09:07
问题 I am trying to implement bag of word model from kaggle site with a twitter sentiments data which has around 1M raw. I already clean it but in last part when I applied my features vectors and sentiments to Random Forest classifier it is taking so much time.here is my code... from sklearn.ensemble import RandomForestClassifier forest = RandomForestClassifier(n_estimators = 100,verbose=3) forest = forest.fit( train_data_features, train["Sentiment"] ) train_data_features is 1048575x5000 sparse

How to get Stanford CoreNLP to use a training model you created?

為{幸葍}努か 提交于 2019-12-23 05:02:02
问题 I just created a training model for stanfordCoreNLP, so I have a bunch of files that look like this: Now, how do I tell CoreNLP to use the model I created and not the models that come with coreNLP? Is it something I pass in the command line or something in my java code like: props.put("sentiment.model"); I noticed there's a jar file in my coreNLP library called stanford-corenlp-3.5.1-models.jar. Does this jar file have anything to do with what I want to do? Thank you 回答1: in Java: props.put(

Lazy parsing with Stanford CoreNLP to get sentiment only of specific sentences

一笑奈何 提交于 2019-12-21 20:25:29
问题 I am looking for ways to optimize the performance of my Stanford CoreNLP sentiment pipeline. As a result, a want to get sentiment of sentences but only those which contain specific keywords given as an input. I have tried two approaches: Approach 1: StanfordCoreNLP pipeline annotating entire text with sentiment I have defined a pipeline of annotators: tokenize, ssplit, parse, sentiment. I have run it on entire article, then looked for keywords in each sentence and, if they were present, run a

Using Sentiwordnet 3.0

回眸只為那壹抹淺笑 提交于 2019-12-21 02:22:22
问题 I plan on using Sentiwordnet 3.0 for Sentiment classification. Could someone clarify as to what the numbers associated with words in Sentiwordnet represent? For e.g. what does 5 in rank#5 mean? Also for POS what is the letter used to represent adverbs? Im assuming 'a' is adjectives. I could not find an explanation either on their site or on other sites. 回答1: I found the answer. Seems like the number notation comes form Wordnet. It represents the rank in which the given word is commonly used.

Questions about creating stanford CoreNLP training models

狂风中的少年 提交于 2019-12-19 11:33:33
问题 I've been working with Stanford's coreNLP to perform sentiment analysis on some data I have and I'm working on creating a training model. I know we can create a training model with the following command: java -mx8g edu.stanford.nlp.sentiment.SentimentTraining -numHid 25 -trainPath train.txt -devPath dev.txt -train -model model.ser.gz I know what goes in the train.txt file. You score sentences and put them in train.txt, something like this: (0 (2 Today) (0 (0 (2 is) (0 (2 a) (0 (0 bad) (2 day)

R sentiment analysis with phrases in dictionaries

帅比萌擦擦* 提交于 2019-12-18 18:30:20
问题 I am performing sentiment analysis on a set of Tweets that I have and I now want to know how to add phrases to the positive and negative dictionaries. I've read in the files of the phrases I want to test but when running the sentiment analysis it doesn't give me a result. When reading through the sentiment algorithm, I can see that it is matching the words to the dictionaries but is there a way to scan for words as well as phrases? Here is the code: score.sentiment = function(sentences, pos