Creating a pandas DataFrame from columns of other DataFrames with similar indexes

后端 未结 3 464
自闭症患者
自闭症患者 2020-12-28 12:02

I have 2 DataFrames df1 and df2 with the same column names [\'a\',\'b\',\'c\'] and indexed by dates. The date index can have similar values. I would like to create a DataFra

相关标签:
3条回答
  • 2020-12-28 12:45

    You can use concat:

    In [11]: pd.concat([df1['c'], df2['c']], axis=1, keys=['df1', 'df2'])
    Out[11]: 
                     df1       df2
    2014-01-01       NaN -0.978535
    2014-01-02 -0.106510 -0.519239
    2014-01-03 -0.846100 -0.313153
    2014-01-04 -0.014253 -1.040702
    2014-01-05  0.315156 -0.329967
    2014-01-06 -0.510577 -0.940901
    2014-01-07       NaN -0.024608
    2014-01-08       NaN -1.791899
    
    [8 rows x 2 columns]
    

    The axis argument determines the way the DataFrames are stacked:

    df1 = pd.DataFrame([1, 2, 3])
    df2 = pd.DataFrame(['a', 'b', 'c'])
    
    pd.concat([df1, df2], axis=0)
       0
    0  1
    1  2
    2  3
    0  a
    1  b
    2  c
    
    pd.concat([df1, df2], axis=1)
    
       0  0
    0  1  a
    1  2  b
    2  3  c
    
    0 讨论(0)
  • 2020-12-28 12:45

    What you ask for is the join operation. With the how argument, you can define how unique indices are handled. Here, some article, which looks helpful concerning this point. In the example below, I left out cosmetics (like renaming columns) for simplicity.

    Code

    import numpy as np
    import pandas as pd
    df1 = pd.DataFrame(np.random.randn(5,3), index=pd.date_range('01/02/2014',periods=5,freq='D'), columns=['a','b','c'] )
    df2 = pd.DataFrame(np.random.randn(8,3), index=pd.date_range('01/01/2014',periods=8,freq='D'), columns=['a','b','c'] )
    
    df3 = df1.join(df2, how='outer', lsuffix='_df1', rsuffix='_df2')
    print(df3)
    

    Output

                   a_df1     b_df1     c_df1     a_df2     b_df2     c_df2
    2014-01-01       NaN       NaN       NaN  0.109898  1.107033 -1.045376
    2014-01-02  0.573754  0.169476 -0.580504 -0.664921 -0.364891 -1.215334
    2014-01-03 -0.766361 -0.739894 -1.096252  0.962381 -0.860382 -0.703269
    2014-01-04  0.083959 -0.123795 -1.405974  1.825832 -0.580343  0.923202
    2014-01-05  1.019080 -0.086650  0.126950 -0.021402 -1.686640  0.870779
    2014-01-06 -1.036227 -1.103963 -0.821523 -0.943848 -0.905348  0.430739
    2014-01-07       NaN       NaN       NaN  0.312005  0.586585  1.531492
    2014-01-08       NaN       NaN       NaN -0.077951 -1.189960  0.995123
    
    0 讨论(0)
  • 2020-12-28 12:59

    Well, I'm not sure that merge would be the way to go. Personally I would build a new data frame by creating an index of the dates and then constructing the columns using list comprehensions. Possibly not the most pythonic way, but it seems to work for me!

    import pandas as pd
    import numpy as np
    
    df1 = pd.DataFrame(np.random.randn(5,3), index=pd.date_range('01/02/2014',periods=5,freq='D'), columns=['a','b','c'] )
    df2 = pd.DataFrame(np.random.randn(8,3), index=pd.date_range('01/01/2014',periods=8,freq='D'), columns=['a','b','c'] )
    
    # Create an index list from the set of dates in both data frames
    Index = list(set(list(df1.index) + list(df2.index)))
    Index.sort()
    
    df3 = pd.DataFrame({'df1': [df1.loc[Date, 'c'] if Date in df1.index else np.nan for Date in Index],\
                    'df2': [df2.loc[Date, 'c'] if Date in df2.index else np.nan for Date in Index],},\
                    index = Index)
    
    df3
    
    0 讨论(0)
提交回复
热议问题