MemoryError on large merges with pandas in Python

后端未结

关注

 3  743

日久生厌 2021-01-17 16:01

I\'m using pandas to do an outer merge on a set of about ~1000-2000 CSV files. Each CSV file has an identifier column id which is shared between al

3条回答

刺人心 (楼主)

2021-01-17 16:29
I think you'll get better performance using a concat (which acts like an outer join):
```
dfs = (pd.read_csv(filename).set_index('id') for filename in filenames)
merged_df = pd.concat(dfs, axis=1)
```
This means you are doing only one merge operation rather than one for each file.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...