Python Killed: 9 when running a code using dictionaries created from 2 csv files

后端未结

关注

 3  1721

余生分开走 2021-01-17 17:41

I am running a code that has always worked for me. This time I ran it on 2 .csv files: \"data\" (24 MB) and \"data1\" (475 MB). \"data\" has 3 columns of about 680000 elemen

3条回答

南方客 (楼主)

2021-01-17 18:04
How much memory does your computer have?

You can add a couple of optimizations that will save some memory, and if that's not enough, you can trade-off some CPU and IO for better memory efficiency.

If you're only comparing the keys and don't really do anything with the values, you can extract only the keys:
```
d1 = set([rows[0] for rows in my_data1])
```
Then instead of OrderedDict, you can try using ordered set either from this answer -- Does python has ordered set or using ordered-set module from pypi.

Once you got all the intersecting keys, you can write another program that looks up all the matching values from source csv.

If these optimizations aren't enough, you can extract all the keys from the bigger set, save them into a file, then load keys one-by-one from the file using generators so the program you will only keep one set of keys plus one key instead of two sets.

Also I'd suggest using python pickle module for storing intermediate results.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...