问题
I am trying to scrape Twitter in order to get the follower/friend counts of certain user. I have a large list of users to check out. I actually want to collect the output into a dictionary and then write the output into a CSV file. I tried both the pandas (dict -> dataframe -> csv) and (dict -> CSV) routes but I keep getting failed writing.
My codes are below:
# Writing directly from Dictionary to CSV
auth = tweepy.OAuthHandler(api_key, api_secret_key)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True,
wait_on_rate_limit_notify=True)
# *Just a sample of the large user list I want to check out*
z =['Schwarzenegger', 'drdrew', 'NPR', 'billboard', 'SenJohnMcCain', 'LaurenJauregui', 'MarkRuffalo']
for i in z:
user_dict = {}
follower_count = api.get_user(i).followers_count
friend_count = api.get_user(i).friends_count
# print(i, follower_count, friend_count)
# create a dictionary to hold values
user_dict[i] = follower_count, friend_count
# Write dictionary into csv file
cols = ["username", "followers_count"]
try:
with open('details.csv', 'w', newline='', encoding='utf8') as f:
writer = csv.DictWriter(f, fieldnames=cols)
writer.writeheader()
for data,val in user_dict.items():
writer.writerows([{"username": data, "followers_count": val}])
except IOError:
print("I/O error")
#Notify me when operation is completed
print("file write completed")
Output >>> File contains only the last entry:
MarkRuffalo,"(6674117, 1852)"
The Dict -> DF -> csv route also produced a file that only has headings but empty contents:
df = pd.DataFrame(user_dict, columns = ["follower_count","friend_count"])
print(df)
df.to_csv('user_files.csv', header=True)
Please what can I do to ensure all the dictionary entries are written into the file. Thank you. P.S: I am new to all of these, so my writing may be awkward.
回答1:
- Place "cols" inside the for loop after the open() statement
- Put the for loop (for i in z:) inside your "try" after the writeheader() statement
- Remove this line: "for data,val in user_dict.items():"
- Use the API variables (from the for loop) in your writerow variables ("writerow" is not plural - remove the "s" at the end)
These resources will help you:
Iterating Through a Dictionary in Python: https://realpython.com/iterate-through-dictionary-python/
Reading & Writing CSV Files: https://realpython.com/python-csv/
I tried it on my end and it worked. I apologize for the indenting it may be off
# Write dictionary into csv file
try:
with open('details.csv', node='w') as f:
cols = ["username", "followers_count","friends_count"]
writer = csv.DictWriter(f, fieldnames=cols)
writer.writeheader()
for i in z:
user_dict = {}
follower_count = api.get_user(i).followers_count
friend_count = api.get_user(i).friends_count
# print(i, follower_count, friend_count)
# assign values
user_dict[i] = follower_count, friend_count
#write to each row
writer.writerow({cols[0]:i, cols[1]:follower_count, cols[2]:friend_count})
except IOError:
print("I/O error")
#Notify me when operation is completed
print("file write completed")
For the Panda DataFrame: I got it to work using the below - but there are no headers displays dictionary key+value(s) in separate columns
df = pd.DataFrame(data=user_dict)
print(df)
df.to_csv('user_files.csv', header=True)
A third example - now using Transpose to display dictionary key+values(s) on separate rows
df = pd.DataFrame(data = user_dict)
df = df.T
print(df)
df.to_csv('user_files2.csv', header=True)
You will have to play around with the column headers on these ones
My Resources: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html
回答2:
Collect all the data, then write to the file once
- Instantiate
user_dictoutside of thefor-loop - Collect data for all names
- Write the file, after all the names data is collected
- The current implementation is using
'w'to write, instead of'a'to append.- Each time you iterated a name, the file was opened, written over, and closed.
- Your
user_dicthas a name as thekey, but thevalueis actually a tuple consisting offollower_countandfriend_count. .writerowshas been updated to separate the values into two columns.
import tweepy
import csv
# new implementation
z = ['Schwarzenegger', 'drdrew', 'NPR', 'billboard', 'SenJohnMcCain', 'LaurenJauregui', 'MarkRuffalo']
user_dict = {}
for i in z:
follower_count = api.get_user(i).followers_count
friend_count = api.get_user(i).friends_count
# add data to user_dict
user_dict[i] = follower_count, friend_count
# output of user_dict
print(user_dict)
{'LaurenJauregui': (4278575, 12242),
'MarkRuffalo': (6674056, 1852),
'NPR': (8230126, 69947),
'Schwarzenegger': (4642078, 375),
'SenJohnMcCain': (3043105, 377),
'billboard': (8949035, 3199),
'drdrew': (2753348, 1009)}
# Write dictionary into csv file
cols = ["username", "followers_count", "friend_count"]
try:
with open('details.csv', 'w', newline='', encoding='utf8') as f:
writer = csv.DictWriter(f, fieldnames=cols)
writer.writeheader()
for data, val in user_dict.items():
writer.writerows([{"username": data, "followers_count": val[0], "friend_count": val[1]}])
except IOError:
print("I/O error")
# csv file
username,followers_count,friend_count
Schwarzenegger,4642078,375
drdrew,2753348,1009
NPR,8230126,69947
billboard,8949035,3199
SenJohnMcCain,3043105,377
LaurenJauregui,4278575,12242
MarkRuffalo,6674056,1852
来源:https://stackoverflow.com/questions/62378798/how-to-resolve-csv-dictwriter-overwriting-data-in-the-csv