Merge based on multiple columns of all excel files from a directory in Python

巧了我就是萌 提交于 2020-04-30 06:38:29

问题


Say I have a dataframe df, and a directory ./ which has the following excel files inside:

path = './'
for root, dirs, files in os.walk(path):
    for file in files:
        if file.endswith(('.xls', '.xlsx')):
            print(os.path.join(root, file))
            # dfs.append(read_dfs(os.path.join(root, file)))
# df = reduce(lambda left, right: pd.concat([left, right], axis = 0), dfs)

Out:

df1.xlsx,
df2.xlsx,
df3.xls
...

I want to merge df with all files from path based on common columns date and city. It works with the following code, but it's not concise enough.

So I raise a question for improving the code, thank you.

df = pd.merge(df, df1, on = ['date', 'city'], how='left')
df = pd.merge(df, df2, on = ['date', 'city'], how='left')
df = pd.merge(df, df3, on = ['date', 'city'], how='left')
...

Reference:

pandas three-way joining multiple dataframes on columns


回答1:


The following code may works:

from functools import reduce

dfs = [df0, df1, df2, dfN]
df_final = reduce(lambda left, right: pd.merge(left, right, on=['date', 'city']), dfs)


来源:https://stackoverflow.com/questions/61314450/merge-based-on-multiple-columns-of-all-excel-files-from-a-directory-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!