ExponentialSmoothing - What prediction method to use for this date plot?

孤街浪徒 提交于 2021-02-10 05:54:30

问题


I currently have these data points of date vs cumulative sum. I want to predict the cumulative sum for future dates using python. What prediction method should I use?

My dates series are in this format: ['2020-01-20', '2020-01-24', '2020-01-26', '2020-01-27', '2020-01-30', '2020-01-31'] dtype='datetime64[ns]'

  • I tried spline but seems like spline can't handle date-time series
  • I tried Exponential Smoothing for time series forecasting but the result is incorrect. I don't understand what predict(3) means and why it returns the predicted sum for dates I already have. I copied this code from an example. Here's my code for exp smoothing:

    fit1 = ExponentialSmoothing(date_cumsum_df).fit(smoothing_level=0.3,optimized=False)
    
    fcast1 = fit1.predict(3)
    
    fcast1
    
    
    
    2020-01-27       1.810000
    2020-01-30       2.467000
    2020-01-31       3.826900
    2020-02-01       5.978830
    2020-02-02       7.785181
    2020-02-04       9.949627
    2020-02-05      11.764739
    2020-02-06      14.535317
    2020-02-09      17.374722
    2020-02-10      20.262305
    2020-02-16      22.583614
    2020-02-18      24.808530
    2020-02-19      29.065971
    2020-02-20      39.846180
    2020-02-21      58.792326
    2020-02-22     102.054628
    2020-02-23     201.038240
    2020-02-24     321.026768
    2020-02-25     474.318737
    2020-02-26     624.523116
    2020-02-27     815.166181
    2020-02-28    1100.116327
    2020-02-29    1470.881429
    2020-03-01    1974.317000
    2020-03-02    2645.321900
    2020-03-03    3295.025330
    2020-03-04    3904.617731
    

What method will be best suited for the sum values prediction that seems to be exponentially increasing? Also I'm pretty new to data science with python so go easy on me. Thanks.


回答1:


Exponential Smoothing only works for data without any missing time series values. I'll show you forecasting of your data +5 days into future for your three methods mentioned:

  • Exponential Fit (your guess "seems to be exponentially increasing")
  • Spline interpolation
  • Exponential Smoothing

Note: I got your data by data-thiefing it from your plot and saved the dates to dates and the data values to values

import pandas as pd
import numpy as np
from statsmodels.tsa.holtwinters import ExponentialSmoothing
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from scipy.optimize import curve_fit
from scipy.interpolate import splrep, splev

df = pd.DataFrame()
# mdates.date2num allows functions like curve_fit and spline to digest time series data
df['dates'] = mdates.date2num(dates)
df['values'] = values 

# Exponential fit function
def exponential_func(x, a, b, c, d):
    return a*np.exp(b*(x-c))+d

# Spline interpolation
def spline_interp(x, y, x_new):
    tck = splrep(x, y)
    return splev(x_new, tck)

# define forecast timerange (forecasting 5 days into future)
dates_forecast = np.linspace(df['dates'].min(), df['dates'].max() + 5, 100)
dd = mdates.num2date(dates_forecast)

# Doing exponential fit
popt, pcov = curve_fit(exponential_func, df['dates'], df['values'], 
                       p0=(1, 1e-2, df['dates'][0], 1))

# Doing spline interpolation
yy = spline_interp(df['dates'], df['values'], dates_forecast)

So far straight forward (except of the mdates.date2num function). Since you got missing data you have to use spline interpolation on your actual data to fill missing time spots with interpolated data

# Interpolating data for exponential smoothing (no missing data in time series allowed)
df_interp = pd.DataFrame()
df_interp['dates'] = np.arange(dates[0], dates[-1] + 1, dtype='datetime64[D]')
df_interp['values'] = spline_interp(df['dates'], df['values'], 
                                    mdates.date2num(df_interp['dates']))
series_interp = pd.Series(df_interp['values'].values, 
                          pd.date_range(start='2020-01-19', end='2020-03-04', freq='D'))

# Now the exponential smoothing works fine, provide the `trend` argument given your data 
# has a clear (kind of exponential) trend
fit1 = ExponentialSmoothing(series_interp, trend='mul').fit(optimized=True)

You can plot the three methods and see how their prediction for the upcoming five days is

# Plot data
plt.plot(mdates.num2date(df['dates']), df['values'], 'o')
# Plot exponential function fit
plt.plot(dd, exponential_func(dates_forecast, *popt))
# Plot interpolated values
plt.plot(dd, yy)
# Plot Exponential smoothing prediction using function `forecast`
plt.plot(np.concatenate([series_interp.index.values, fit1.forecast(5).index.values]),
     np.concatenate([series_interp.values, fit1.forecast(5).values]))

Comparison of all three methods shows that you have been right choosing exponential smoothing. It looks way better in forecasting the future five days than the other two methods


Regarding your other question

I don't understand what predict(3) means and why it returns the predicted sum for dates I already have.

ExponentialSmoothing.fit() returns a statsmodels.tsa.holtwinters.HoltWintersResults Object which has two function you can use fore prediction/forecasting of values: predict and forecast:

predict takes a start and end observation of your data and applies the ExponentialSmoothing model to the corresponding date values. For predicting values into the future you have to specify an end parameter which is in the future

>> fit1.predict(start=np.datetime('2020-03-01'), end=np.datetime64('2020-03-09'))
2020-03-01    4240.649526
2020-03-02    5631.207307
2020-03-03    5508.614325
2020-03-04    5898.717779
2020-03-05    6249.810230
2020-03-06    6767.659081
2020-03-07    7328.416024
2020-03-08    7935.636353
2020-03-09    8593.169945
Freq: D, dtype: float64

In your example predict(3) (which equals predict(start=3) predicts the values based on your dates starting with the third date and without any forecasting.

forecast() does only forecasting. You pass simply the number of observation you want to forecast into the future.

>> fit1.forecast(5)
2020-03-05    6249.810230
2020-03-06    6767.659081
2020-03-07    7328.416024
2020-03-08    7935.636353
2020-03-09    8593.169945
Freq: D, dtype: float64

Since both functions are based on the same ExponentialSmoothing.fit model, their values are equal for equal dates.



来源:https://stackoverflow.com/questions/60556547/exponentialsmoothing-what-prediction-method-to-use-for-this-date-plot

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!