Extrapolate Pandas DataFrame

后端 未结 1 1614
深忆病人
深忆病人 2020-12-14 13:04

It is easy to interpolate values in a Pandas.DataFrame using Series.interpolate, how can extrapolation be done?

For example, given a DataFr

相关标签:
1条回答
  • 2020-12-14 13:53

    Extrapolating a DataFrame with a DatetimeIndex index

    This can be done with two steps:

    1. Extend the DatetimeIndex
    2. Extrapolate the data

    Extend the Index

    Overwrite df with a new DataFrame where the data is resampled onto a new extended index based on original index's start, period and frequency. This allows the original df to come from anywhere, as in the csv example case. With this the columns get conveniently filled with NaNs!

    # Fake DataFrame for example (could come from anywhere)
    X1 = range(10)
    X2 = map(lambda x: x**2, X1)
    df = pd.DataFrame({'x1': X1, 'x2': X2},  index=pd.date_range('20130101',periods=10,freq='M'))
    
    # Number of months to extend
    extend = 5
    
    # Extrapolate the index first based on original index
    df = pd.DataFrame(
        data=df,
        index=pd.date_range(
            start=df.index[0],
            periods=len(df.index) + extend,
            freq=df.index.freq
        )
    )
    
    # Display
    print df
    

        x1  x2
    2013-01-31   0   0
    2013-02-28   1   1
    2013-03-31   2   4
    2013-04-30   3   9
    2013-05-31   4  16
    2013-06-30   5  25
    2013-07-31   6  36
    2013-08-31   7  49
    2013-09-30   8  64
    2013-10-31   9  81
    2013-11-30 NaN NaN
    2013-12-31 NaN NaN
    2014-01-31 NaN NaN
    2014-02-28 NaN NaN
    2014-03-31 NaN NaN
    

    Extrapolate the data

    Most extrapolators will require the inputs to be numeric instead of dates. This can be done with

    # Temporarily remove dates and make index numeric
    di = df.index
    df = df.reset_index().drop('index', 1)
    

    See this answer for how to extrapolate the values of each column of a DataFrame with a 3rd order polynomial.

    Snippet from answer

    # Curve fit each column
    for col in fit_df.columns:
        # Get x & y
        x = fit_df.index.astype(float).values
        y = fit_df[col].values
        # Curve fit column and get curve parameters
        params = curve_fit(func, x, y, guess)
        # Store optimized parameters
        col_params[col] = params[0]
    
    # Extrapolate each column
    for col in df.columns:
        # Get the index values for NaNs in the column
        x = df[pd.isnull(df[col])].index.astype(float).values
        # Extrapolate those points with the fitted function
        df[col][x] = func(x, *col_params[col])
    

    Once the columns are extrapolated, put the dates back

    # Put date index back
    df.index = di
    
    # Display
    print df
    

    x1   x2
    2013-01-31   0    0
    2013-02-28   1    1
    2013-03-31   2    4
    2013-04-30   3    9
    2013-05-31   4   16
    2013-06-30   5   25
    2013-07-31   6   36
    2013-08-31   7   49
    2013-09-30   8   64
    2013-10-31   9   81
    2013-11-30  10  100
    2013-12-31  11  121
    2014-01-31  12  144
    2014-02-28  13  169
    2014-03-31  14  196
    
    0 讨论(0)
提交回复
热议问题