Tweepy Tracking Multiple Terms

我是研究僧i 提交于 2019-12-02 08:12:36

问题


I am doing content analysis on tweets. I'm using tweepy to return tweets that match certain terms and then writing N amount of tweets to a CSv file for analysis. Creating the files and getting data is not an issue, but I would like to reduce data collection time. Currently I am iterating through a list of terms from a file. Once the N is reached (eg 500 tweets), it moves to the next filter term.

I would like to input all my terms (less than 400) into a single variable and all the results to match. This works too. What I cannot get is a return value from twitter on what term matched in the status.

class CustomStreamListener(tweepy.StreamListener):
    def __init__(self, output_file, api=None):
        super(CustomStreamListener, self).__init__()
        self.num_tweets = 0
        self.output_file = output_file

    def on_status(self, status):
       cleaned = status.text.replace('\'','').replace('&','').replace('>','').replace(',','').replace("\n",'')
        self.num_tweets = self.num_tweets + 1
        if self.num_tweets < 500:
            self.output_file.write(topicName + ',' + status.user.location.encode("UTF-8") + ',' + cleaned.encode("UTF-8") + "\n")
            print ("capturing tweet number " + str(self.num_tweets) + " for search term: " + topicName)
            return True
        else:
            return False
            sys.exit("terminating")

    def on_error(self, status_code):
        print >> sys.stderr, 'Encountered error with status code:', status_code
        return True # Don't kill the stream

    def on_timeout(self):
        print >> sys.stderr, 'Timeout...'
        return True #Don't kill the stream

with open('termList.txt', 'r') as f:
  topics = [line.strip() for line in f]

for topicName in topics:
    stamp = datetime.datetime.now().strftime(topicName + '-%Y-%m-%d-%H%M%S')
    with open(stamp + '.csv', 'w+') as topicFile:
        sapi = tweepy.streaming.Stream(auth, CustomStreamListener(topicFile))
        sapi.filter(track=[topicName])

Specifically my issue is this. How do I get what matched if the track variable has multiple entries? I will also state that I am relatively new to python and tweepy.

Thanks in advance for any advice and assistance!


回答1:


You could check the tweet text against your matching terms. Something like:

>>> a = "hello this is a tweet"
>>> terms = [ "this "]
>>> matches = []
>>> for i, term in enumerate( terms ):
...     if( term in a ):
...             matches.append( i )
... 
>>> matches
[0]
>>> 

Which would give you all of the terms that that specific tweet, a, matched. Which in this case was just the "this" term.



来源:https://stackoverflow.com/questions/21519351/tweepy-tracking-multiple-terms

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!