Tweepy returns inconsistent and not complete results for realDonaldTrump

≡放荡痞女 提交于 2021-02-11 14:03:23

问题


import tweepy
import csv
import json
import nltk
import re



def scrub_text(string):
    nltk.download('words')
    words = set(nltk.corpus.words.words())

    string=re.sub(r'[^a-zA-Z]+', ' ', string).lower()
    string=" ".join(w for w in nltk.wordpunct_tokenize(string)
                if w.lower() in words or not w.isalpha())
    return string


def get_all_tweets():
    with open('twitter_credentials.json') as cred_data:
        info=json.load(cred_data)
        consumer_key=info['API_KEY']
        consumer_secret=info['API_SECRET']
        access_key=info['ACCESS_TOKEN']
        access_secret=info['ACCESS_SECRET']

    screen_name = input("Enter twitter Handle: ")

    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    api=tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True,
                   timeout=500000, retry_count=10, retry_delay=100)

    all_the_tweets=[]

    new_tweets=api.user_timeline(screen_name=screen_name, count=200)

    all_the_tweets.extend(new_tweets)

    oldest_tweet=all_the_tweets[-1].id - 1

    while len(new_tweets) > 0:
        new_tweets=api.user_timeline(screen_name=screen_name, count=200,
                                     max_id=oldest_tweet)
        all_the_tweets.extend(new_tweets)
        oldest_tweet=all_the_tweets[-1].id -1

        print('...%s tweets downloaded' %len(all_the_tweets))

    outtweets=[[tweet.text.encode('utf-8')] for tweet in all_the_tweets]
    outtweets=scrub_text(str(outtweets))

    with open('tweets.txt', 'w') as f:
        f.write(outtweets)
        f.close()

The above python code should download all the tweets from a particular user. It seems to work for most handles, but when I use it for @realDonaldTrump I sometimes get 800, sometimes I get 1. I never get even close to all of the tweets. I am assuming that there is a problem due to how active the account is, but I think there should be a way to get around this.


回答1:


The Twitter timelines API only supports a maximum of 3200 Tweets (source), and this may also depend on age of the Tweet / how far back in time you are paging. Unfortunately, you will not be able to use the API to get all of these Tweets. You would need to use the commercial Full Archive search API to retrieve all of the Tweets from the account.

Regarding the inconsistent number of results, that sounds like a glitch, as it shouldn't vary by that much.



来源:https://stackoverflow.com/questions/60308733/tweepy-returns-inconsistent-and-not-complete-results-for-realdonaldtrump

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!