Predicting from previous date:value data

对着背影说爱祢 提交于 2019-12-05 05:25:51

In your case, the data is changing fast, and you have immediate observations of new data. A quick prediction can be implemented using Holt-winter exponential smoothing.

The update equations:

m_t is the data you have, e.g., the number of people at each time t. v_t is the first derivative, i.e., the trending of m. alpha and beta are two decay parameters. The variable with tilde on top denotes the predicted value. Check the details of the algorithm at the wikipedia page.

Since you use python, I can show you some example code to help you with the data. BTW, I use some synthetic data as below:

data_t = range(15)
data_y = [5,6,15,20,21,22,26,42,45,60,55,58,55,50,49]

Above data_t is a sequence of consecutive data points starting at time 0; data_y is a sequence of observed number of people at each presentation.

The data looks like below ( I tried to make it close to your data).

The code for the algorithm is straightforward.

def holt_alg(h, y_last, y_pred, T_pred, alpha, beta):
    pred_y_new = alpha * y_last + (1-alpha) * (y_pred + T_pred * h)
    pred_T_new = beta * (pred_y_new - y_pred)/h + (1-beta)*T_pred
    return (pred_y_new, pred_T_new)

def smoothing(t, y, alpha, beta):
    # initialization using the first two observations
    pred_y = y[1]
    pred_T = (y[1] - y[0])/(t[1]-t[0])
    y_hat = [y[0], y[1]]
    # next unit time point
    t.append(t[-1]+1)
    for i in range(2, len(t)):
        h = t[i] - t[i-1]
        pred_y, pred_T = holt_alg(h, y[i-1], pred_y, pred_T, alpha, beta)
        y_hat.append(pred_y)
    return y_hat 

Ok, now let's call our predictor and plot the predicted result against the observations:

import matplotlib.pyplot as plt
plt.plot(data_t, data_y, 'x-')
plt.hold(True)

pred_y = smoothing(data_t, data_y, alpha=.8, beta=.5)
plt.plot(data_t[:len(pred_y)], pred_y, 'rx-')
plt.show()

The red shows the prediction result at each time point. I set alpha to be 0.8, so that the most recent observation does affect the next prediction a lot. If you want to give history data more weight, just play with the parameters alpha and beta. Also note, the right-most data point on red-line at t=15 is the last prediction, at which we do not have an observation yet.

BTW, this is far from a perfect prediction. It's just something you can start with quickly. One of the cons of this approach is that you have to be able to get observations, otherwise the prediction would be off more and more (probably this is true for all real-time predictions). Hope it helps.

Prediction is hard. You might want to try polynomial extrapolation - but the estimation mistake will increase drastically as you get farther from the "known" area.

Another possible solution is trying to use machine learning algorithms, but it requires you gathering a lot of data.

Extract features from your data (a feature is the number of entries in a single day, for example). And train the algorithm. (Give it a far past data a features, and the present as the predicted field, for example).

I do not know about python, but in java - there is an open source library called weka that implements most of the functionalities and algorithm used for machine learning.

You can estimate how accurate this method is using cross validation later on.


With that said - this problem is usually referred as trend detection, and is a hot field in research currently, so there is no silver bullet.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!