Predict next event occurrence, based on past occurrences

不想你离开。 提交于 2019-11-28 18:51:12

I think some topics that might be worth looking into include numerical analysis, specifically interpolation, extrapolation, and regression.

This could be overkill, but Markov chains can lead to some pretty cool pattern recognition stuff. It's better suited to, well, chains of events: the idea is, based on the last N steps in a chain of events, what will happen next?

This is well suited to text: process a large sample of Shakespeare, and you can generate paragraphs full of Shakespeare-like nonsense! Unfortunately, it takes a good deal more data to figure out sparsely-populated events. (Detecting patterns with a period of a month or more would require you to track a chain of at least a full month of data.)

In pseudo-python, here's a rough sketch of a Markov chain builder/prediction script:

n = how_big_a_chain_you_want
def build_map(eventChain):
    map = defaultdict(list)
    for events in get_all_n_plus_1_item_slices_of(eventChain):
        slice = events[:n]
        last = events[-1]
        map[slice].append(last)

def predict_next_event(whatsHappenedSoFar, map):
    slice = whatsHappenedSoFar[-n:]
    return random_choice(map[slice])

There is no single 'best' canned solution, it depends on what you need. For instance, you might want to average the values as you say, but using weighted averages where the old values do not contribute as much to the result as the new ones. Or you might try some smoothing. Or you might try to see if the distribution of events fits a well-kjnown distribution (like normal, Poisson, uniform).

If you have a model in mind (such as the events occur regularly), then applying a Kalman filter to the parameters of that model is a common technique.

The only technique I've worked with for trying to do something like that would be training a neural network to predict the next step in the series. That implies interpreting the issue as a problem in pattern classification, which doesn't seem like that great a fit; I have to suspect there are less fuzzy ways of dealing with it.

if you merely want to find the probability of an event occurring after n days given prior data of its frequency, you'll want to fit to an appropriate probability distribution, which generally requires knowing something about the source of the event (maybe it should be poisson distributed, maybe gaussian). if you want to find the probability of an event happening given that prior events happened, you'll want to look at bayesian statistics and how to build a markov chain from that.

The task is very similar to language modelling task where given a sequence of history words the model tries to predict a probability distribution over vocabulary for the next word.

There are open source softwares such as SRILM and NLTK that can simply get your sequences as input sentences (each event_id is a word) and do the job.

You should google Genetic Programming Algorithms

They (sort of like the Neural Networks mentioned by Chaos) will enable you to generate solutions programmatically, then have the program modify itself based on a criteria, and create new solutions which are hopefully closer to accurate.

Neural Networks would have to be trained by you, but with genetic programming, the program will do all the work.

Although it is a hell of a lot of work to get them running in the first place!

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!