I\'m using pandas to do an outer merge on a set of about ~1000-2000 CSV files. Each CSV file has an identifier column id which is shared between al
outer
id
I think you'll get better performance using a concat (which acts like an outer join):
dfs = (pd.read_csv(filename).set_index('id') for filename in filenames) merged_df = pd.concat(dfs, axis=1)
This means you are doing only one merge operation rather than one for each file.