Simple algorithm for online outlier detection of a generic time series

前端未结

关注

 2  872

甜味超标 2021-01-31 06:09

I am working with a large amount of time series. These time series are basically network measurements coming every 10 minutes, and some of them are periodic (i.e. the bandwidth)

2条回答

忘掉有多难 (楼主)

2021-01-31 06:49
I suggest the scheme below, which should be implementable in a day or so:

Training
- Collect as many samples as you can hold in memory
- Remove obvious outliers using the standard deviation for each attribute
- Calculate and store the correlation matrix and also the mean of each attribute
- Calculate and store the Mahalanobis distances of all your samples
Calculating "outlierness":

For the single sample of which you want to know its "outlierness":
- Retrieve the means, covariance matrix and Mahalanobis distances from training
- Calculate the Mahalanobis distance "d" for your sample
- Return the percentile in which "d" falls (using the Mahalanobis distances from training)
That will be your outlier score: 100% is an extreme outlier.

PS. In calculating the Mahalanobis distance, use the correlation matrix, not the covariance matrix. This is more robust if the sample measurements vary in unit and number.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

Simple algorithm for online outlier detection of a generic time series

Training

Calculating "outlierness":