Exclude retweets from twitter streaming api using tweepy

天涯浪子 提交于 2019-11-30 10:27:30

Just checking a tweet's text to see if it starts with 'RT' is not really a robust solution. You need to make a decision about what you will consider a retweet, since it isn't exactly clear-cut. The Twitter API docs explain that tweets with 'RT' in the tweet text aren't officially retweets.

Sometimes people type RT at the beginning of a Tweet to indicate that they are re-posting someone else's content. This isn't an official Twitter command or feature, but signifies that they are quoting another user's Tweet.

If you're going by the 'official' definition, then you want to filter tweets out if they have a True value for their retweeted attribute, like this:

if not tweet['retweeted']:
    # do something with standard tweets

And if you want to be more inclusive, including 'unofficial' re-tweets, you should check the string for the substring 'RT @' and not merely if it starts with 'RT' because that the former is cleaner, faster and eliminates more edge cases where a tweet starts with 'RT' but isn't a retweet (lots of data out there, I'm sure this is a possibility). Here's some code for that:

if not tweet['retweeted'] and 'RT @' not in tweet['text']:
    # do something with standard tweets

The latter conditional takes the subset of tweets in your collection that are regular tweets and does an intersection with the subset of tweets in your collection that do not have 'RT @' in the tweet text, leaving you with tweets that are supposedly regular tweets.

Yes there are possible ways of doing this, One of them is to check if the text of the tweet, starts with RT, For this we can easily use .startswith() method on strings and for this you need to change the code of the on_data() method in your streaming class, which can be done as:

class TwitterStreamListener(tweepy.StreamListener):
    def on_data(self, data):
        # Twitter returns data in JSON format - we need to decode it first
        decoded = json.loads(data)
        if  not decoded[`text`].startswith('RT'):
            #Do processing here 
            print decoded['text'].encode('ascii', 'ignore')
        return True
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!