pandas three-way joining multiple dataframes on columns

前端 未结 11 1932
醉梦人生
醉梦人生 2020-11-22 08:35

I have 3 CSV files. Each has the first column as the (string) names of people, while all the other columns in each dataframe are attributes of that person.

How can

11条回答
  •  一向
    一向 (楼主)
    2020-11-22 09:11

    This is an ideal situation for the join method

    The join method is built exactly for these types of situations. You can join any number of DataFrames together with it. The calling DataFrame joins with the index of the collection of passed DataFrames. To work with multiple DataFrames, you must put the joining columns in the index.

    The code would look something like this:

    filenames = ['fn1', 'fn2', 'fn3', 'fn4',....]
    dfs = [pd.read_csv(filename, index_col=index_col) for filename in filenames)]
    dfs[0].join(dfs[1:])
    

    With @zero's data, you could do this:

    df1 = pd.DataFrame(np.array([
        ['a', 5, 9],
        ['b', 4, 61],
        ['c', 24, 9]]),
        columns=['name', 'attr11', 'attr12'])
    df2 = pd.DataFrame(np.array([
        ['a', 5, 19],
        ['b', 14, 16],
        ['c', 4, 9]]),
        columns=['name', 'attr21', 'attr22'])
    df3 = pd.DataFrame(np.array([
        ['a', 15, 49],
        ['b', 4, 36],
        ['c', 14, 9]]),
        columns=['name', 'attr31', 'attr32'])
    
    dfs = [df1, df2, df3]
    dfs = [df.set_index('name') for df in dfs]
    dfs[0].join(dfs[1:])
    
         attr11 attr12 attr21 attr22 attr31 attr32
    name                                          
    a         5      9      5     19     15     49
    b         4     61     14     16      4     36
    c        24      9      4      9     14      9
    

提交回复
热议问题