I\'ve been learning Python for a couple of months through online courses and would like to further my learning through a real world mini project.
For this project,
I just insert the raw JSON into the database. It seems a bit ugly and hacky but it does work. A noteable problem is that the creation dates of the Tweets are stored as strings. How do I compare dates from Twitter data stored in MongoDB via PyMongo? provides a way to fix that (I inserted a comment in the code to indicate where one would perform that task)
# ...
client = pymongo.MongoClient()
db = client.twitter_db
twitter_collection = db.tweets
# ...
class CustomStreamListener(tweepy.StreamListener):
# ...
def on_status(self, status):
try:
twitter_json = status._json
# TODO: Transform created_at to Date objects before insertion
tweet_id = twitter_collection.insert(twitter_json)
except:
# Catch any unicode errors while printing to console
# and just ignore them to avoid breaking application.
pass
# ...
stream = tweepy.Stream(auth, CustomStreamListener(), timeout=None, compression=True)
stream.sample()