How to create a lagged data structure using pandas dataframe

后端 未结 8 2251
耶瑟儿~
耶瑟儿~ 2020-12-04 15:24

Example

s=pd.Series([5,4,3,2,1], index=[1,2,3,4,5])
print s 
1    5
2    4
3    3
4    2
5    1

Is there an efficient way to create a serie

相关标签:
8条回答
  • 2020-12-04 16:20

    Very simple solution using pandas DataFrame:

    number_lags = 3
    df = pd.DataFrame(data={'vals':[5,4,3,2,1]})
    for lag in xrange(1, number_lags + 1):
        df['lag_' + str(lag)] = df.vals.shift(lag)
    
    #if you want numpy arrays with no null values: 
    df.dropna().values for numpy arrays
    

    for Python 3.x (change xrange to range)

    number_lags = 3
    df = pd.DataFrame(data={'vals':[5,4,3,2,1]})
    for lag in range(1, number_lags + 1):
        df['lag_' + str(lag)] = df.vals.shift(lag)
    
    print(df)
    
       vals  lag_1  lag_2  lag_3
    0     5    NaN    NaN    NaN
    1     4    5.0    NaN    NaN
    2     3    4.0    5.0    NaN
    3     2    3.0    4.0    5.0
    4     1    2.0    3.0    4.0
    
    0 讨论(0)
  • 2020-12-04 16:21

    As mentioned, it could be worth looking into the rolling_ functions, which will mean you won't have as many copies around.

    One solution is to concat shifted Series together to make a DataFrame:

    In [11]: pd.concat([s, s.shift(), s.shift(2)], axis=1)
    Out[11]: 
       0   1   2
    1  5 NaN NaN
    2  4   5 NaN
    3  3   4   5
    4  2   3   4
    5  1   2   3
    
    In [12]: pd.concat([s, s.shift(), s.shift(2)], axis=1).dropna()
    Out[12]: 
       0  1  2
    3  3  4  5
    4  2  3  4
    5  1  2  3
    

    Doing work on this will be more efficient that on lists...

    0 讨论(0)
提交回复
热议问题