Pandas DataFrame concat vs append

后端 未结 4 1307
盖世英雄少女心
盖世英雄少女心 2020-11-28 03:36

I have a list of 4 pandas dataframes containing a day of tick data that I want to merge into a single data frame. I cannot understand the behavior of concat on my timestamps

4条回答
  •  忘掉有多难
    2020-11-28 04:07

    So what are you doing is with append and concat is almost equivalent. The difference is the empty DataFrame. For some reason this causes a big slowdown, not sure exactly why, will have to look at some point. Below is a recreation of basically what you did.

    I almost always use concat (though in this case they are equivalent, except for the empty frame); if you don't use the empty frame they will be the same speed.

    In [17]: df1 = pd.DataFrame(dict(A = range(10000)),index=pd.date_range('20130101',periods=10000,freq='s'))
    
    In [18]: df1
    Out[18]: 
    
    DatetimeIndex: 10000 entries, 2013-01-01 00:00:00 to 2013-01-01 02:46:39
    Freq: S
    Data columns (total 1 columns):
    A    10000  non-null values
    dtypes: int64(1)
    
    In [19]: df4 = pd.DataFrame()
    
    The concat
    
    In [20]: %timeit pd.concat([df1,df2,df3])
    1000 loops, best of 3: 270 us per loop
    
    This is equavalent of your append
    
    In [21]: %timeit pd.concat([df4,df1,df2,df3])
    10 loops, best of 
    
     3: 56.8 ms per loop
    

提交回复
热议问题