tweepy Streaming API : full text

问题

I am using tweepy streaming API to get the tweets containing a particular hashtag . The problem that I am facing is that I am unable to extract full text of the tweet from the Streaming API . Only 140 characters are available and after that it gets truncated.

Here is the code:

auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)

def analyze_status(text):

if 'RT' in text[0:3]:
    return True
else:
    return False

class MyStreamListener(tweepy.StreamListener):

def on_status(self, status):

if not analyze_status(status.text) :

    with open('fetched_tweets.txt','a') as tf:
        tf.write(status.text.encode('utf-8') + '\n\n')

    print(status.text)

def on_error(self, status):
print("Error Code : " + status)

def test_rate_limit(api, wait=True, buffer=.1):
    """
    Tests whether the rate limit of the last request has been reached.
    :param api: The `tweepy` api instance.
    :param wait: A flag indicating whether to wait for the rate limit reset
             if the rate limit has been reached.
    :param buffer: A buffer time in seconds that is added on to the waiting
               time as an extra safety margin.
    :return: True if it is ok to proceed with the next request. False otherwise.
    """
    #Get the number of remaining requests
    remaining = int(api.last_response.getheader('x-rate-limit-remaining'))
    #Check if we have reached the limit
    if remaining == 0:
    limit = int(api.last_response.getheader('x-rate-limit-limit'))
    reset = int(api.last_response.getheader('x-rate-limit-reset'))
    #Parse the UTC time
    reset = datetime.fromtimestamp(reset)
    #Let the user know we have reached the rate limit
    print "0 of {} requests remaining until {}.".format(limit, reset)

    if wait:
        #Determine the delay and sleep
        delay = (reset - datetime.now()).total_seconds() + buffer
        print "Sleeping for {}s...".format(delay)
        sleep(delay)
        #We have waited for the rate limit reset. OK to proceed.
        return True
    else:
        #We have reached the rate limit. The user needs to handle the rate limit manually.
        return False 

    #We have not reached the rate limit
    return True

myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener , 
tweet_mode='extended')


myStream.filter(track=['#bitcoin'],async=True)

Does any one have a solution ?

回答1:

tweet_mode=extended will have no effect in this code, since the Streaming API does not support that parameter. If a Tweet contains longer text, it will contain an additional object in the JSON response called extended_tweet, which will in turn contain a field called full_text.

In that case, you'll want something like print(status.extended_tweet.full_text) to extract the longer text.

回答2:

You have to enable extended tweet mode like so:

s = tweepy.Stream(auth, l, tweet_mode='extended')

Then you can print the extended tweet, but remember due to Twitter APIs you have to make sure extended tweet exists otherwise it'll throw an error

l = listener()

class listener(StreamListener):
    def on_status(self, status):
        try:
            print(status.extended_tweet['full_text'])
        except Exception as e:
            raise
        else:
            print(status.text)
        return True
    def on_error(self, status_code):
        if status_code == 420:
            return False

Worked for me.

回答3:

Building upon @AndyPiper's answer, you can check to see if the tweet is there by either a try/except:

  def get_tweet_text(tweet):
    try:
      return tweet.extended_tweet['full_text']
    except AttributeError as e:
      return tweet.text

OR check against the inner json:

  def get_tweet_text(tweet):
    if 'extended_tweet' in tweet._json:
      return tweet.extended_tweet['full_text']
    else:
      return tweet.text

Note that extended_tweet is a dictionary object, so "tweet.extended_tweet.full_text" doesn't actually work and will throw an error.

回答4:

In addition to the previous answer: in my case it worked only as status.extended_tweet['full_text'], because the status.extended_tweet is nothing but a dictionary.

回答5:

this is what worked for me:

status = tweet if 'extended_tweet' in status._json: status_json = status._json['extended_tweet']['full_text'] elif 'retweeted_status' in status._json and 'extended_tweet' in status._json['retweeted_status']: status_json = status._json['retweeted_status']['extended_tweet']['full_text'] elif 'retweeted_status' in status._json: status_json = status._json['retweeted_status']['full_text'] else: status_json = status._json['full_text'] print(status_json)'

https://github.com/tweepy/tweepy/issues/935 - implemented from here, needed to change what they suggest but the idea stays the same

回答6:

There is Boolean available in the Twitter stream. 'status.truncated' is True when the message contains more than 140 characters. Only then the 'extended_tweet' object is available:

        if not status.truncated:
            text = status.text
        else:
            text = status.extended_tweet['full_text']

This works only when you are streaming tweets. When you are collecting older tweets using the API method you can use something like this:

tweets = api.user_timeline(screen_name='whoever', count=5, tweet_mode='extended')
for tweet in tweets:
    print(tweet.full_text)

This full_text field contains the text of all tweets, truncated or not.

来源：https://stackoverflow.com/questions/48319243/tweepy-streaming-api-full-text

标签

twitter

text-mining

tweepy