Scipy: distance correlation is higher than 1

跟風遠走 提交于 2020-01-13 10:03:26

问题


I'm trying to find distance correlation between columns, look at the code below. Most of time it returns higher than 1 result, which is not possible, because distance correlation is between 0 and 1. You can read about scipy's distance correlation here.

import numpy as np
from scipy.spatial import distance

x = np.random.uniform(-1, 1, 10000)
print distance.correlation(x, x**2)

1.00210811815

What is wrong here or how can I measure it?

upd1: Link to issue on github


回答1:


I don't see why this is a problem according to the documentation.

From the documentation:

The correlation distance between u and v, is defined as 1 - \frac{(u - \bar{u}) \cdot (v - \bar{v})} {{||(u - \bar{u})||}_2 {||(v - \bar{v})||}_2}

By the Cauchy-Schwarz Inequality, the expression following the minus sign has an absolute value that is at most 1. There is nothing stipulating that it won't be negative, though - in fact, this will happen if the (mean normalized) vectors are anticorrelated.

AFAICT, you should be surprised if you'd get a value larger than 2 or smaller than 0. Using the comment by @Cleb and the fact that the range is [0, 2], I'm guessing that some other packages simply define the distance as half this expression.




回答2:


@josef-pkt 's answer on github is given below:

It's not a distance correlation which is a nonlinear measure of dependence. e.g. my take http://jpktd.blogspot.ca/2012/06/non-linear-dependence-measures-distance.html However, "correlation" in scipy.spatial.distance.correlation is a bit misleading because according to the formula in the docstring it's a distance measure and not a correlation. perfectly correlated with correlation coefficient equal to 1 has zero distance perfectly negatively correlated with correlation coefficient equal to -1 has maximal distance at 2.




回答3:


Correlational distance is the inverse of correlation and only looks at the angle/similarity among patterns (sort of like normalization). Correlational distance goes from 0 - 2, with 0 being PERFECT correlation, 1 being no correlation, and 2 being PERFECT ANTICORRELATION. So a small correlational distance value means close together in correlational space (small angular difference). Corr = 1 – dist; Corr dist = 1 – corr; so while a high correlation = high relationship; LOW CORR DISTINANCE = high relationship



来源:https://stackoverflow.com/questions/35988933/scipy-distance-correlation-is-higher-than-1

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!