How to extract only texts in hashtag using tweepy?

大憨熊 提交于 2019-12-01 01:11:47

Instead of using the DataFrame constructor you could use the json_normalize function:

>>> import pandas as pd
>>> d = {'Hashtags' : 
...      [{u'indices': [53, 65], u'text': u'Predictions'}, 
...       {u'indices': [67, 76], u'text': u'FreeTips'}, 
...       {u'indices': [78, 89], u'text': u'SoccerTips'}, 
...       {u'indices': [90, 103], u'text': u'FootballTips'}, 
...       {u'indices': [104, 110], u'text': u'Goals'}]}
>>>  pd.io.json.json_normalize(d, 'Hashtags')
      indices          text
0    [53, 65]   Predictions
1    [67, 76]      FreeTips
2    [78, 89]    SoccerTips
3   [90, 103]  FootballTips
4  [104, 110]         Goals

Then you could just use the 'text' column:

>>> pd.io.json.json_normalize(d, 'Hashtags')['text'].tolist()
[u'Predictions', u'FreeTips', u'SoccerTips', u'FootballTips', u'Goals']

Here's the solution :

After troubleshooting and trying various methods for a lot of time, I finally figured out how to split the nested dictionary. It is a fairly simple loop. I noticed that we can access the hashtag text by

tweets['Hashtags'][1][1]['text']
Out[209]: u'INDvPAK'

This was a valuable insight as i got to know I DON'T need to mention u'text as my index. text will be used.

Code :

ht=[]
for s in range(len(tweets['Hashtags'])):
    hasht=[]
    for t in range(len(tweets.Hashtags[s])):
        #zx = tweets['Hashtags'][s][t]['text']
        hasht.append(tweets['Hashtags'][s][t]['text'])
        t=t+1
    ht.append(hasht)
    s=s+1
tweets['HT']=zip(ht)

This is a simple nested for loop which iterates through first the inner key values in the { "Indices" : [], "u'text'" : []} and then iterates through the list of dictionaries under ["entities" : { "Hashtags" : [{1},{2},{3}]}]

Finally I used zip() to zip the lists of hashtags for a single row/user.

OUTPUT :

([u'SoccerTips', u'FootballTips'],)

This can be easily splitted.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!