Why NUMPY correlate and corrcoef return different values and how to “normalize” a correlate in “full” mode?

前端 未结 2 520
遥遥无期
遥遥无期 2020-12-08 15:38

I\'m trying to use some Time Series Analysis in Python, using Numpy.

I have two somewhat medium-sized series, with 20k values each and I want to check the sliding co

相关标签:
2条回答
  • 2020-12-08 15:56

    You are looking for normalized cross-correlation. This option isn't available yet in Numpy, but a patch is waiting for review that does just what you want. It shouldn't be too hard to apply it I would think. Most of the patch is just doc string stuff. The only lines of code that it adds are

    if normalize:
        a = (a - mean(a)) / (std(a) * len(a))
        v = (v - mean(v)) /  std(v)
    

    where a and v are the inputted numpy arrays of which you are finding the cross-correlation. It shouldn't be hard to either add them into your own distribution of Numpy or just make a copy of the correlate function and add the lines there. I would do the latter personally if I chose to go this route.

    Another, quite possibly better, alternative is to just do the normalization to the input vectors before you send it to correlate. It's up to you which way you would like to do it.

    By the way, this does appear to be the correct normalization as per the Wikipedia page on cross-correlation except for dividing by len(a) rather than (len(a)-1). I feel that the discrepancy is akin to the standard deviation of the sample vs. sample standard deviation and really won't make much of a difference in my opinion.

    0 讨论(0)
  • 2020-12-08 16:08

    According to this slides, I would suggest to do it this way:

    def cross_correlation(a1, a2):
            lags = range(-len(a1)+1, len(a2))
            cs = []
            for lag in lags:
                idx_lower_a1 = max(lag, 0)
                idx_lower_a2 = max(-lag, 0)
                idx_upper_a1 = min(len(a1), len(a1)+lag)
                idx_upper_a2 = min(len(a2), len(a2)-lag)
                b1 = a1[idx_lower_a1:idx_upper_a1]
                b2 = a2[idx_lower_a2:idx_upper_a2]
                c = np.correlate(b1, b2)[0]
                c = c / np.sqrt((b1**2).sum() * (b2**2).sum())
                cs.append(c)
            return cs
    
    0 讨论(0)
提交回复
热议问题