问题
I have a large set of time series (> 500), I'd like to select only the ones that are periodic. I did a bit of literature research and I found out that I should look for autocorrelation. Using numpy
I calculate the autocorrelation as:
def autocorr(x):
norm = x - np.mean(x)
result = np.correlate(norm, norm, mode='full')
acorr = result[result.size/2:]
acorr /= ( x.var() * np.arange(x.size, 0, -1) )
return acorr
This returns a set of coefficients (r?) that when plot should tell me if the time series is periodic or not.
I generated two toy examples:
#random signal
s1 = np.random.randint(5, size=80)
#periodic signal
s2 = np.array([5,2,3,1] * 20)
When I generate the autocorrelation plots I obtain:
The second autocorrelation vector clearly indicates some periodicity:
Autocorr1 = [1, 0.28, -0.06, 0.19, -0.22, -0.13, 0.07 ..]
Autocorr2 = [1, -0.50, -0.49, 1, -0.50, -0.49, 1 ..]
My question is, how can I automatically determine, from the autocorrelation vector, if a time series is periodic? Is there a way to summarise the values into a single coefficient, e.g. if = 1 perfect periodicity, if = 0 no periodicity at all. I tried to calculate the mean but it is not meaningful. Should I look at the number of 1?
回答1:
I would use mode='same' instead of mode='full' because with mode='full' we get covariances for extreme shifts, where just 1 array element overlaps self, the rest being zeros. Those are not going to be interesting. With mode='same' at least half of the shifted array overlaps the original one.
Also, to have the true correlation coefficient (r) you need to divide by the size of the overlap, not by the size of the original x. (in my code these are np.arange(n-1, n//2, -1)
). Then each of the outputs will be between -1 and 1.
A glance at Durbin–Watson statistic, which is similar to 2(1-r), suggests that people consider its values below 1 to be a significant indication of autocorrelation, which corresponds to r > 0.5. So this is what I use below. For a statistically sound treatment of the significance of autocorrelation refer to statistics literature; a starting point would be to have a model for your time series.
def autocorr(x):
n = x.size
norm = (x - np.mean(x))
result = np.correlate(norm, norm, mode='same')
acorr = result[n//2 + 1:] / (x.var() * np.arange(n-1, n//2, -1))
lag = np.abs(acorr).argmax() + 1
r = acorr[lag-1]
if np.abs(r) > 0.5:
print('Appears to be autocorrelated with r = {}, lag = {}'. format(r, lag))
else:
print('Appears to be not autocorrelated')
return r, lag
Output for your two toy examples:
Appears to be not autocorrelated
Appears to be autocorrelated with r = 1.0, lag = 4
来源:https://stackoverflow.com/questions/47351483/autocorrelation-to-estimate-periodicity-with-numpy