retrieving a list of tweets using tweet ID in tweepy

冷暖自知 提交于 2019-12-06 15:54:44

问题


I ha ve a file containing a list of tweet IDs and I want to retrieve those tweets. The file contains more than 100000 tweets and the twitter API allows to retrieve only 100.

api = tweepy.API(auth)
good_tweet_ids = [i for i in por.TweetID[0:100]]
tweets = api.statuses_lookup(good_tweet_ids)
for tweet in tweets:
    print(tweet.text)

Is there a way to retrieve more tweets say 1000 or 2000, I don't want to take a sample of the data and save the results to a file and change the index of the tweet ID every time so is there a way to do that !?


回答1:


Yes - twitter only lets you lookup 100 tweets at a time, but you can look up another 100 immediately after that. The only concern then is rate limits - you are restricted by the number of calls that you can make to the API in each 15 minute window. Fortunately, tweepy is able to handle this gracefully when you create the API by using wait_on_rate_limit=True. All we need to do, then, is process our full list of tweet IDs into batches of 100 or fewer (suppose you have 130 - the second batch should only be the final 30) and look them up one at a time. Try the following:

import tweepy


def lookup_tweets(tweet_IDs, api):
    full_tweets = []
    tweet_count = len(tweet_IDs)
    try:
        for i in range((tweet_count / 100) + 1):
            # Catch the last group if it is less than 100 tweets
            end_loc = min((i + 1) * 100, tweet_count)
            full_tweets.extend(
                api.statuses_lookup(id=tweet_IDs[i * 100:end_loc])
            )
        return full_tweets
    except tweepy.TweepError:
        print 'Something went wrong, quitting...'

consumer_key = 'XXX'
consumer_secret = 'XXX'
access_token = 'XXX'
access_token_secret = 'XXX'

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

# do whatever it is to get por.TweetID - the list of all IDs to look up

results = lookup_tweets(por.TweetID, api)

for tweet in results:
    if tweet:
        print tweet.text



回答2:


Addition to the code above. The output format if the tweet is a twitter status object. The following piece of code will convert it into a sterilizable json and then map it to the tweet id to get a full df.

df = pd.read_csv('your.csv')
good_tweet_ids = [i for i in df.TweetID] #tweet ids to look up 
results = lookup_tweets(good_tweet_ids, api) #apply function

#Wrangle the data into one dataframe
import json
temp = json.dumps([status._json for status in results]) #create JSON
newdf = pd.read_json(temp, orient='records')
full = pd.merge(df, newdf, left_on='TweetID', right_on='id', how='left').drop('id', axis=1)


来源:https://stackoverflow.com/questions/44581647/retrieving-a-list-of-tweets-using-tweet-id-in-tweepy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!