Weighted correlation coefficient with pandas

前端 未结 2 933
醉酒成梦
醉酒成梦 2020-12-16 16:25

Is there any way to compute weighted correlation coefficient with pandas? I saw that R has such a method. Also, I\'d like to get the p value of the correlation. This I did n

相关标签:
2条回答
  • 2020-12-16 17:12

    I don't know of any Python packages that implement this, but it should be fairly straightforward to roll your own implementation. Using the naming conventions of the wikipedia article:

    def m(x, w):
        """Weighted Mean"""
        return np.sum(x * w) / np.sum(w)
    
    def cov(x, y, w):
        """Weighted Covariance"""
        return np.sum(w * (x - m(x, w)) * (y - m(y, w))) / np.sum(w)
    
    def corr(x, y, w):
        """Weighted Correlation"""
        return cov(x, y, w) / np.sqrt(cov(x, x, w) * cov(y, y, w))
    

    I tried to make the functions above match the formulas in the wikipedia as closely as possible, but there are some potential simplifications and performance improvements. For example, as pointed out by @Alberto Garcia-Raboso, m(x, w) is really just np.average(x, weights=w), so there's no need to actually write a function for it.

    The functions are pretty bare-bones, just doing the calculations. You may want to consider forcing inputs to be arrays prior to doing the calculations, i.e. x = np.asarray(x), as these functions will not work if lists are passed. Additional checks to verify all inputs have equal length, non-null values, etc. could also be implemented.

    Example usage:

    # Initialize a DataFrame.
    np.random.seed([3,1415])
    n = 10**6
    df = pd.DataFrame({
        'x': np.random.choice(3, size=n),
        'y': np.random.choice(4, size=n),
        'w': np.random.random(size=n)
        })
    
    # Compute the correlation.
    r = corr(df['x'], df['y'], df['w'])
    

    There's a discussion here regarding the p-value. It doesn't look like there's a generic calculation, and it depends on how you're actually getting the weights.

    0 讨论(0)
  • 2020-12-16 17:13

    The statsmodels package has an implementation of weighted correlation.

    0 讨论(0)
提交回复
热议问题