问题
I am using the following code to do auto-correlation on data_1 and data_2:
result = numpy.correlate(data_1, data_2, mode='full')
The result is also the time series. I also normalized result to result1:
result1 = StandardScaler().fit_transform(result.astype('float32').reshape(-1, 1))
Then here is the plot, data_1 is black, data_2 is red, result1 is green:
I know there is a lag between data_1 and data_2, so I am wondering what's the best way to find the lag? Thanks!
回答1:
numpy.correlate does not center the data, so one should do it prior to calling the method:
corr = np.correlate(data_1 - np.mean(data_1),
data_2 - np.mean(data_2),
mode='full')
This only changes corr by a constant, but still, a reasonable thing to do: uncorrelated shifts will show up as 0.
Second, your chart with all three things on one horizontal scale doesn't seem helpful; with mode='full' the length of correlation array is about twice the length of original ones.
Picking the maximum of corr with corr.argmax() is a reasonable thing to do. One just has to be aware of how the index works here. With mode='full' the 0th index of corr corresponds to the shift k in the formula sum_n a[n+k] * conj(v[n]) being 1 - len(a), meaning a is moved extremely far to the left so that there is just one element of overlap between shifted a and v. So, subtracting len(a) - 1 from this index gives the actual shift of a with respect to v.
A made-up example:
import numpy as np
import matplotlib.pyplot as plt
data_1 = np.sin(np.linspace(0, 10, 100))
data_1 += np.random.uniform(size=data_1.shape) # noise
data_2 = np.cos(np.linspace(0, 7, 70))
data_2 += np.random.uniform(size=data_2.shape) # noise
corr = np.correlate(data_1 - np.mean(data_1),
data_2 - np.mean(data_2),
mode='full')
plt.plot(corr)
plt.show()
lag = corr.argmax() - (len(data_1) - 1)
print(lag)
plt.plot(data_1, 'r*')
plt.plot(data_2, 'b*')
plt.show()
Here the lag is printed as -14 or -15 (depending on random noise) which on this scale means -1.4 or -1.5. This is reasonable, as sin is trailing cos by pi/2, or about 1.57. In other words, moving the red dots to the left by 14-15 elements maximizes the match with the blue dots.
Data:
Correlation:
来源:https://stackoverflow.com/questions/49372282/find-the-best-lag-from-the-numpy-correlate-output