How do I filter tweets using location AND keyword?

孤街浪徒 提交于 2019-12-25 01:42:52

问题


I'm a new Python user and have been experimenting with tweepy. I understand the twitter API does not allow for filtering on both location and keywords. To get around this, I've adapted the code from here: How to add a location filter to tweepy module. While it works fine when there are only a few keywords, it ceases to print out statuses when I increase the number of keywords. I think it's probably because iterating over the keyword list is not the best way to do it. Does anyone have any suggestions on how to resolve this?

import sys
import tweepy
import json

consumer_key=" "
consumer_secret=" "
access_key = " "
access_secret = " "

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
### keywords for the public stream
keyword = "iPhone", "Samsung", "HTC", "Sony", "Blackberry"
### initialize blank list to contain tweets
tweets = []
### file name that you want to open is the second argument
f = open('today.txt', 'a')

class CustomStreamListener(tweepy.StreamListener):
    global tweets
    def on_status(self, status):
        ### info that you want to capture
        info = status.id, status.text, status.created_at, status.place, status.user, status.in_reply_to_screen_name, status.in_reply_to_status_id 
        for word in keyword:
            if word in status.text.lower():
                print status.text
                # this is for writing the tweets into the txt file
                f.write(str(info))
                try:
                    tweets.append(info)
                except:
                    pass


    def on_error(self, status_code):
        print >> sys.stderr, 'Encountered error with status code:', status_code
        return True # Don't kill the stream

    def on_timeout(self):
        print >> sys.stderr, 'Timeout...'
        return True # Don't kill the stream

### filter for location
# locations should be a pair of longtitude and latitude pairs, with the southwest corner
# of the bounding box coming first
sapi = tweepy.streaming.Stream(auth, CustomStreamListener())    
sapi.filter(locations=[103.60998,1.25752,104.03295,1.44973])

回答1:


Use Regular expressions to search the tweet. as follows

    import re
    keyword = ["iPhone", "Samsung", "HTC", "Sony", "Blackberry"]
    patterns = [r'\b%s\b' % re.escape(s.strip()) for s in keyword.lower()]
    there = re.compile('|'.join(patterns))
    stream=["i have a iPhone","i dont like Samsung","HTC design are awesome","Sony camera is good","Blackberry lost market","Nokia soldout to windows"]
    for i in stream:
        if there.search(i):
            print("Tweet Found  %r" % (i))


来源:https://stackoverflow.com/questions/23531835/how-do-i-filter-tweets-using-location-and-keyword

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!