问题
I am using Tweepy to stream tweets and would like to record them in a CSV format so I can play around with them or load them in database later. Please keep in mind that I am a noob, but I do realize there are multiple ways of handling this (suggestions are very welcome).
Long story short, I need to convert and append multiple Python dictionaries to a CSV file. I already did my research (How do I write a Python dictionary to a csv file?) and tried doing this with DictWriter and writer methods.
However, there are few more things that need to be accomplished:
1) Write key as header only once.
2) As new tweet is streamed, value needs to be appended without overwriting previous rows.
3) If value is missing record NULL.
4) Skip/fix ascii codec errors.
Here is the format of what I would like to end up with (each value is in its individual cell):
Header1_Key_1 Header2_Key_2 Header3_Key_3...
Row1_Value_1 Row1_Value_2 Row1_Value_3...
Row2_Value_1 Row2_Value_2 Row2_Value_3...
Row3_Value_1 Row3_Value_2 Row3_Value_3...
Row4_Value_1 Row4_Value_2 Row4_Value_3...
Here is my code:
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import csv
import json
consumer_key="XXXX"
consumer_secret="XXXX"
access_token="XXXX"
access_token_secret="XXXX"
class StdOutListener(StreamListener):
def on_data(self, data):
json_data = json.loads(data)
data_header = json_data.keys()
data_row = json_data.values()
try:
with open('csv_tweet3.csv', 'wb') as f:
w = csv.DictWriter(f, data_header)
w.writeheader(data_header)
w.writerow(json_data)
except BaseException, e:
print 'Something is wrong', str(e)
return True
def on_error(self, status):
print status
if __name__ == '__main__':
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
stream.filter(track=['world cup'])
Thank you in advance!
回答1:
I have done a similar thing with facebook's graph API (facepy module)!
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import csv
import json
consumer_key="XXXX"
consumer_secret="XXXX"
access_token="XXXX"
access_token_secret="XXXX"
class StdOutListener(StreamListener):
_headers = None
def __init__(self,headers,*args,**keys):
StreamListener.__init__(self,*args,**keys)
self._headers = headers
def on_data(self, data):
json_data = json.loads(data)
#data_header = json_data.keys()
#data_row = json_data.values()
try:
with open('csv_tweet3.csv', 'ab') as f: # a for append
w = csv.writer(f)
# write!
w.writerow(self._valToStr(json_data[header])
if header in json_data else ''
for header in self._headers)
except Exception, e:
print 'Something is wrong', str(e)
return True
@static_method
def _valToStr(o):
# json returns a set number of datatypes - parse dependingly
# https://docs.python.org/2/library/json.html#encoders-and-decoders
if type(o)==unicode: return self._removeNonASCII(o)
elif type(o)==bool: return str(o)
elif type(o)==None: return ''
elif ...
...
def _removeNonASCII(s):
return ''.join(i if ord(i)<128 else '' for i in s)
def on_error(self, status):
print status
if __name__ == '__main__':
headers = ['look','at','twitter','api',
'to','find','all','possible',
'keys']
# initialize csv file with header info
with open('csv_tweet3.csv', 'wb') as f:
w = csv.writer(headers)
l = StdOutListener(headers)
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
stream.filter(track=['world cup'])
It's not copy&paste ready, but it's clear enough to where you should be able to finish it.
For performance, you may want to look opening the file, write several records, then close the file. This way you're not consistently opening, initializing the csv writer, appending, then closing the file. I'm not familiar with the tweepy API, so I'm not sure exactly how this would work - but it's worth looking into.
If you run into any trouble, I'll be happy to help - enjoy!
来源:https://stackoverflow.com/questions/24155319/writing-multiple-json-to-csv-in-python-dictionary-to-csv