From Nested Dictionary to CSV File

[亡魂溺海] 提交于 2020-08-27 12:00:41

问题


I have nested dictionary (with length > 70.000):

users_item = {
    "sessionId1": {
        "12345645647": 1.0, 
        "9798654": 5.0 

    },         
    "sessionId2":{
        "3445657657": 1.0

    },
    "sessionId3": {
        "87967976": 5.0, 
        "35325626436": 1.0, 
        "126789435": 1.0, 
        "72139856": 5.0      
    },
    "sessionId4": {
        "4582317": 1.0         
    }
......
}

I want create CSV file from my nested dictionary, my result will look like:

sessionId1 item rating
sessionId1 item rating
sessionId2 item rating
sessionId3 item rating
sessionId3 item rating
sessionId3 item rating
sessionId3 item rating
.......

I found this post: Convert Nested Dictionary to CSV Table

It's similar to my question but it's not working when I try all answers, pandas library run out of memory

How I can make CSV file with my data?


回答1:


Just loop through the dictionary and use the Python csv writer to write to the csv file.

with open('output.csv', 'w') as csv_file:
    csvwriter = csv.writer(csv_file, delimiter='\t')
    for session in users_item:
        for item in users_item[session]:
            csvwriter.writerow([session, item, users_item[session][item]])



回答2:


for session, ratings in users_item.items():
    for rating, value in ratings.items():
        print("{} {}".format(session, value))

Output:

sessionId3 5.0
sessionId3 1.0
sessionId3 5.0
sessionId3 1.0
sessionId1 5.0
sessionId1 1.0
sessionId4 1.0
sessionId2 1.0

Note that a dict (user_items) has no order. So unless you specify the order of rows using some other way, the ouput will be in the order the dict uses internally.

Edit: This approach has no problems with a file containing 70k entries.

Edit: If you want to write to a CSV file, use the csv module or just pipe the output to a file.




回答3:


Assuming you want each session as a row, the number of columns for every row will be the total number of unique keys in all session dicts. Based on the data you've given, I'm guessing the number of unique keys are astronomical.

That is why you're running into memory issues with the solution given in this discussion. It's simply too much data to hold in memory at one time.

Your only option if my assumptions are correct are to divide and conquer. Break the data into smaller chunks and write them to a file in csv format. Then merge the csv files at the end.




回答4:


If you iteratively write the file, there should be no memory issues:

import csv

users_item = {
    "sessionId1": {
        "12345645647": 1.0,
        "9798654": 5.0

    },
    "sessionId2":{
        "3445657657": 1.0

    },
    "sessionId3": {
        "87967976": 5.0,
        "35325626436": 1.0,
        "126789435": 1.0,
        "72139856": 5.0
    },
    "sessionId4": {
        "4582317": 1.0
    }
}

with open('nested_dict.csv', 'w') as output:
    writer = csv.writer(output, delimiter='\t')
    for sessionId in sorted(users_item):
        ratings = users_item[sessionId]
        for item in ratings:
            writer.writerow([sessionId, item, ratings[item]])

Resulting contents of output file (where » represents a tab characters):

sessionId1»  12345645647»  1.0
sessionId1»  9798654»      5.0
sessionId2»  3445657657»   1.0
sessionId3»  126789435»    1.0
sessionId3»  87967976»     5.0
sessionId3»  35325626436»  1.0
sessionId3»  72139856»     5.0
sessionId4»  4582317»      1.0


来源:https://stackoverflow.com/questions/38454203/from-nested-dictionary-to-csv-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!