MemoryError on large merges with pandas in Python

后端 未结 3 743
日久生厌
日久生厌 2021-01-17 16:01

I\'m using pandas to do an outer merge on a set of about ~1000-2000 CSV files. Each CSV file has an identifier column id which is shared between al

3条回答
  •  刺人心
    刺人心 (楼主)
    2021-01-17 16:29

    I think you'll get better performance using a concat (which acts like an outer join):

    dfs = (pd.read_csv(filename).set_index('id') for filename in filenames)
    merged_df = pd.concat(dfs, axis=1)
    

    This means you are doing only one merge operation rather than one for each file.

提交回复
热议问题