How to resolve csv.DictWriter overwriting data in the csv?

孤者浪人 提交于 2020-06-27 05:27:12

问题


I am trying to scrape Twitter in order to get the follower/friend counts of certain user. I have a large list of users to check out. I actually want to collect the output into a dictionary and then write the output into a CSV file. I tried both the pandas (dict -> dataframe -> csv) and (dict -> CSV) routes but I keep getting failed writing.

My codes are below:

# Writing directly from Dictionary to CSV  

auth = tweepy.OAuthHandler(api_key, api_secret_key)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True,
    wait_on_rate_limit_notify=True)

# *Just a sample of the large user list I want to check out*
z =['Schwarzenegger', 'drdrew', 'NPR', 'billboard', 'SenJohnMcCain', 'LaurenJauregui', 'MarkRuffalo']

for i in z:
    user_dict = {}
    follower_count = api.get_user(i).followers_count
    friend_count = api.get_user(i).friends_count
    # print(i, follower_count, friend_count)

    # create a dictionary to hold values
    user_dict[i] = follower_count, friend_count

    # Write dictionary into csv file
    cols = ["username", "followers_count"]
    try:
        with open('details.csv', 'w', newline='', encoding='utf8') as f:
            writer = csv.DictWriter(f, fieldnames=cols)
            writer.writeheader()
            for data,val in user_dict.items():
                writer.writerows([{"username": data, "followers_count": val}])
    except IOError:
        print("I/O error")

#Notify me when operation is completed
print("file write completed")

Output >>> File contains only the last entry:

MarkRuffalo,"(6674117, 1852)"

The Dict -> DF -> csv route also produced a file that only has headings but empty contents:

df = pd.DataFrame(user_dict, columns = ["follower_count","friend_count"])
print(df)
df.to_csv('user_files.csv', header=True)

Please what can I do to ensure all the dictionary entries are written into the file. Thank you. P.S: I am new to all of these, so my writing may be awkward.


回答1:


  1. Place "cols" inside the for loop after the open() statement
  2. Put the for loop (for i in z:) inside your "try" after the writeheader() statement
  3. Remove this line: "for data,val in user_dict.items():"
  4. Use the API variables (from the for loop) in your writerow variables ("writerow" is not plural - remove the "s" at the end)

These resources will help you:

Iterating Through a Dictionary in Python: https://realpython.com/iterate-through-dictionary-python/

Reading & Writing CSV Files: https://realpython.com/python-csv/

I tried it on my end and it worked. I apologize for the indenting it may be off

# Write dictionary into csv file

try:
    with open('details.csv', node='w') as f:
    cols = ["username", "followers_count","friends_count"]
    writer = csv.DictWriter(f, fieldnames=cols)

    writer.writeheader()
    for i in z:
        user_dict = {}
        follower_count = api.get_user(i).followers_count
        friend_count = api.get_user(i).friends_count
        # print(i, follower_count, friend_count)

        # assign values
        user_dict[i] = follower_count, friend_count

        #write to each row
        writer.writerow({cols[0]:i, cols[1]:follower_count, cols[2]:friend_count})

except IOError:
    print("I/O error")

#Notify me when operation is completed
print("file write completed")

For the Panda DataFrame: I got it to work using the below - but there are no headers displays dictionary key+value(s) in separate columns

df = pd.DataFrame(data=user_dict)
print(df)
df.to_csv('user_files.csv', header=True)

A third example - now using Transpose to display dictionary key+values(s) on separate rows

df = pd.DataFrame(data = user_dict)
df = df.T
print(df)
df.to_csv('user_files2.csv', header=True)

You will have to play around with the column headers on these ones

My Resources: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html




回答2:


Collect all the data, then write to the file once

  • Instantiate user_dict outside of the for-loop
  • Collect data for all names
  • Write the file, after all the names data is collected
  • The current implementation is using 'w' to write, instead of 'a' to append.
    • Each time you iterated a name, the file was opened, written over, and closed.
  • Your user_dict has a name as the key, but the value is actually a tuple consisting of follower_count and friend_count.
  • .writerows has been updated to separate the values into two columns.
import tweepy
import csv

# new implementation
z = ['Schwarzenegger', 'drdrew', 'NPR', 'billboard', 'SenJohnMcCain', 'LaurenJauregui', 'MarkRuffalo']
user_dict = {}
for i in z:
    follower_count = api.get_user(i).followers_count
    friend_count = api.get_user(i).friends_count

    # add data to user_dict
    user_dict[i] = follower_count, friend_count

# output of user_dict
print(user_dict)
{'LaurenJauregui': (4278575, 12242),
 'MarkRuffalo': (6674056, 1852),
 'NPR': (8230126, 69947),
 'Schwarzenegger': (4642078, 375),
 'SenJohnMcCain': (3043105, 377),
 'billboard': (8949035, 3199),
 'drdrew': (2753348, 1009)}

    
# Write dictionary into csv file
cols = ["username", "followers_count", "friend_count"]  
try:
    with open('details.csv', 'w', newline='', encoding='utf8') as f:
        writer = csv.DictWriter(f, fieldnames=cols)
        writer.writeheader()
        for data, val in user_dict.items():
            writer.writerows([{"username": data, "followers_count": val[0], "friend_count": val[1]}])
except IOError:
    print("I/O error")


# csv file
username,followers_count,friend_count
Schwarzenegger,4642078,375
drdrew,2753348,1009
NPR,8230126,69947
billboard,8949035,3199
SenJohnMcCain,3043105,377
LaurenJauregui,4278575,12242
MarkRuffalo,6674056,1852


来源:https://stackoverflow.com/questions/62378798/how-to-resolve-csv-dictwriter-overwriting-data-in-the-csv

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!