Pandas: Merge data frames on datetime index

后端 未结 3 1538
天命终不由人
天命终不由人 2020-12-31 01:11

I have the following two dataframes that I have set date to DateTime Index df.set_index(pd.to_datetime(df[\'date\']), inplace=True) and would like to merge or j

相关标签:
3条回答
  • 2020-12-31 01:27

    I ran into similar problems. You most likely have a lot of NaTs.
    I removed all my NaTs and then performed the join and was able to join it.

          df = df[df['date'].notnull() == True].set_index('date')
          d = d[d['date'].notnull() == True].set_index('date')
          df.join(d, how='right')
    
    
    0 讨论(0)
  • 2020-12-31 01:28

    You can add parameters left_index=True and right_index=True if you need merge by indexes in function merge:

    merge=pd.merge(df,d, how='inner', left_index=True, right_index=True)
    

    Sample (first value of index in d was changed for matching):

    print df
               catcode_amt type feccandid_amt  amount
    date                                             
    1915-12-31       A5000  24K     H6TX08100    1000
    1916-12-31       T6100  24K     H8CA52052     500
    1954-12-31       H3100  24K     S8AK00090    1000
    1985-12-31       J7120  24E     H8OH18088      36
    1997-12-31       z9600  24K     S6ND00058    2000
    
    print d
               catcode_disp disposition            feccandid_disp  bills
    date                                                                
    1997-12-31        A0000     support                 S4HI00011    1.0
    2007-12-31        A1000      oppose  S4IA00020', 'P20000741 1    NaN
    2007-12-31        A1000     support                 S8MT00010    1.0
    2007-12-31        A1500     support                 S6WI00061    2.0
    2007-12-31        A1600     support  S4IA00020', 'P20000741 3    NaN
    
    merge=pd.merge(df,d, how='inner', left_index=True, right_index=True)
    print merge
               catcode_amt type feccandid_amt  amount catcode_disp disposition  \
    date                                                                         
    1997-12-31       z9600  24K     S6ND00058    2000        A0000     support   
    
               feccandid_disp  bills  
    date                              
    1997-12-31      S4HI00011    1.0  
    

    Or you can use concat:

    print pd.concat([df,d], join='inner', axis=1)
    
    date                                                                         
    1997-12-31       z9600  24K     S6ND00058    2000        A0000     support   
    
               feccandid_disp  bills  
    date                              
    1997-12-31      S4HI00011    1.0  
    

    EDIT: EdChum is right:

    I add duplicates to DataFrame df (last 2 values in index):

    print df
               catcode_amt type feccandid_amt  amount
    date                                             
    1915-12-31       A5000  24K     H6TX08100    1000
    1916-12-31       T6100  24K     H8CA52052     500
    1954-12-31       H3100  24K     S8AK00090    1000
    2007-12-31       J7120  24E     H8OH18088      36
    2007-12-31       z9600  24K     S6ND00058    2000
    
    print d
               catcode_disp disposition            feccandid_disp  bills
    date                                                                
    1997-12-31        A0000     support                 S4HI00011    1.0
    2007-12-31        A1000      oppose  S4IA00020', 'P20000741 1    NaN
    2007-12-31        A1000     support                 S8MT00010    1.0
    2007-12-31        A1500     support                 S6WI00061    2.0
    2007-12-31        A1600     support  S4IA00020', 'P20000741 3    NaN
    
    merge=pd.merge(df,d, how='inner', left_index=True, right_index=True)
    
    print merge
               catcode_amt type feccandid_amt  amount catcode_disp disposition  \
    date                                                                         
    2007-12-31       J7120  24E     H8OH18088      36        A1000      oppose   
    2007-12-31       J7120  24E     H8OH18088      36        A1000     support   
    2007-12-31       J7120  24E     H8OH18088      36        A1500     support   
    2007-12-31       J7120  24E     H8OH18088      36        A1600     support   
    2007-12-31       z9600  24K     S6ND00058    2000        A1000      oppose   
    2007-12-31       z9600  24K     S6ND00058    2000        A1000     support   
    2007-12-31       z9600  24K     S6ND00058    2000        A1500     support   
    2007-12-31       z9600  24K     S6ND00058    2000        A1600     support   
    
                          feccandid_disp  bills  
    date                                         
    2007-12-31  S4IA00020', 'P20000741 1    NaN  
    2007-12-31                 S8MT00010    1.0  
    2007-12-31                 S6WI00061    2.0  
    2007-12-31  S4IA00020', 'P20000741 3    NaN  
    2007-12-31  S4IA00020', 'P20000741 1    NaN  
    2007-12-31                 S8MT00010    1.0  
    2007-12-31                 S6WI00061    2.0  
    2007-12-31  S4IA00020', 'P20000741 3    NaN  
    
    0 讨论(0)
  • 2020-12-31 01:30

    It looks like your dates are your indices, in which case you would want to merge on the index, not column. If you have two dataframes, df_1 and df_2:

    df_1.merge(df_2, left_index=True, right_index=True, how='inner')

    0 讨论(0)
提交回复
热议问题