How to do sentiment analysis of headlines with TextBlob and Python

社会主义新天地 提交于 2019-12-13 03:01:21

问题


I want to calculate the polarity and subjectivity for some headlines that I have. My code works fine, it does not gives any error but for some rows it gives result 0.00000 for polarity and subjectivity. Do you know why?

You can download the data form here:

https://www.sendspace.com/file/e8w4tw

Am I doing something wrong? This is the code:

import pandas as pd
from textblob import TextBlob

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

df = pd.read_excel('coca cola news.xlsx', encoding='utf8')

df = df.dropna().reset_index(drop = True)
df = df.drop_duplicates().reset_index(drop = True)
print(df)

head_sentiment = []
head_subj = []

par_sentiment = []
par_subj = []


df['Headline Sentiment'] =  df['Headline'].apply(lambda text: TextBlob(text).sentiment.polarity).round(4)
df['Headline Subjectivity'] =  df['Headline'].apply(lambda text: TextBlob(text).sentiment.subjectivity).round(4)

df['Paragraph Sentiment'] =  df['Paragraph'].apply(lambda text: TextBlob(text).sentiment.polarity).round(4)
df['Paragraph Subjectivity'] =  df['Paragraph'].apply(lambda text: TextBlob(text).sentiment.subjectivity).round(4)

print(df)

print(df[df.columns[-4:]])

I mean, I know that 0 is possible result, but Im getting 0.0000 in 40%-50% of rows, thats a lot, not even 0.00001, that seams strange to me.

Can you help me?


回答1:


its sometimes happen. Try to use polarity method from polyglot. https://polyglot.readthedocs.io/en/latest/Installation.html

and compare results. Firstly you should make some preprocessing like:

  • remove stopwords
  • remove numbers, html links, numbers, special characters etc


来源:https://stackoverflow.com/questions/58920075/how-to-do-sentiment-analysis-of-headlines-with-textblob-and-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!