Merge multiple DataFrames Pandas

后端 未结 5 818
無奈伤痛
無奈伤痛 2020-12-14 09:56

This might be considered as a duplicate of a thorough explanation of various approaches, however I can\'t seem to find a solution to my problem there due to a higher number

相关标签:
5条回答
  • 2020-12-14 10:29

    You can also use:

    dfs = [df1, df2, df3]
    df = pd.merge(dfs[0], dfs[1], left_on=['depth','profile'], right_on=['depth','profile'], how='outer')
    for d in dfs[2:]:
        df = pd.merge(df, d, left_on=['depth','profile'], right_on=['depth','profile'], how='outer')
    
       depth       VAR1    profile     VAR2    VAR3
    0    0.5  38.196202  profile_1      NaN     NaN
    1    0.6  38.198002  profile_1  0.20440     NaN
    2    1.3  38.200001  profile_1      NaN  15.182
    3    1.1        NaN  profile_1  0.20442     NaN
    4    1.2        NaN  profile_1  0.20446  15.188
    5    1.4        NaN  profile_1      NaN  15.182
    
    0 讨论(0)
  • 2020-12-14 10:31

    I would use append.

    >>> df1.append(df2).append(df3).sort_values('depth')
    
            VAR1     VAR2    VAR3  depth    profile
    0  38.196202      NaN     NaN    0.5  profile_1
    1  38.198002      NaN     NaN    0.6  profile_1
    0        NaN  0.20440     NaN    0.6  profile_1
    1        NaN  0.20442     NaN    1.1  profile_1
    2        NaN  0.20446     NaN    1.2  profile_1
    0        NaN      NaN  15.188    1.2  profile_1
    2  38.200001      NaN     NaN    1.3  profile_1
    1        NaN      NaN  15.182    1.3  profile_1
    2        NaN      NaN  15.182    1.4  profile_1
    

    Obviously if you have a lot of dataframes, just make a list and loop through them.

    0 讨论(0)
  • 2020-12-14 10:31

    Why not concatenate all the Data Frames, melt, then reform them using your ids? There might be a more efficient way to do this, but this works.

    df=pd.melt(pd.concat([df1,df2,df3]),id_vars=['profile','depth'])
    df_pivot=df.pivot_table(index=['profile','depth'],columns='variable',values='value')
    

    Where df_pivot will be

    variable              VAR1     VAR2    VAR3
    profile   depth                            
    profile_1 0.5    38.196202      NaN     NaN
              0.6    38.198002  0.20440     NaN
              1.1          NaN  0.20442     NaN
              1.2          NaN  0.20446  15.188
              1.3    38.200001      NaN  15.182
              1.4          NaN      NaN  15.182
    
    0 讨论(0)
  • 2020-12-14 10:32

    Consider setting index on each data frame and then run the horizontal merge with pd.concat:

    dfs = [df.set_index(['profile', 'depth']) for df in [df1, df2, df3]]
    
    print(pd.concat(dfs, axis=1).reset_index())
    #      profile  depth       VAR1     VAR2    VAR3
    # 0  profile_1    0.5  38.198002      NaN     NaN
    # 1  profile_1    0.6  38.198002  0.20440     NaN
    # 2  profile_1    1.1        NaN  0.20442     NaN
    # 3  profile_1    1.2        NaN  0.20446  15.188
    # 4  profile_1    1.3  38.200001      NaN  15.182
    # 5  profile_1    1.4        NaN      NaN  15.182
    
    0 讨论(0)
  • 2020-12-14 10:34

    A simple way is with a combination of functools.partial/reduce.

    Firstly partial allows to "freeze" some portion of a function’s arguments and/or keywords resulting in a new object with a simplified signature. Then with reduce we can apply cumulatively the new partial object to the items of iterable (list of dataframes here):

    from functools import partial, reduce
    
    dfs = [df1, df2, df3]
    merge = partial(pd.merge, on=['depth', 'profile'], how='outer')
    reduce(merge, dfs)
    
       depth       VAR1    profile     VAR2    VAR3
    0    0.6  38.198002  profile_1  0.20440     NaN
    1    0.6  38.198002  profile_1  0.20440     NaN
    2    1.3  38.200001  profile_1      NaN  15.182
    3    1.1        NaN  profile_1  0.20442     NaN
    4    1.2        NaN  profile_1  0.20446  15.188
    5    1.4        NaN  profile_1      NaN  15.182
    
    0 讨论(0)
提交回复
热议问题