Scipy: distance correlation is higher than 1

断了今生、忘了曾经 提交于 2019-12-05 08:18:44

I don't see why this is a problem according to the documentation.

From the documentation:

The correlation distance between u and v, is defined as 1 - \frac{(u - \bar{u}) \cdot (v - \bar{v})} {{||(u - \bar{u})||}_2 {||(v - \bar{v})||}_2}

By the Cauchy-Schwarz Inequality, the expression following the minus sign has an absolute value that is at most 1. There is nothing stipulating that it won't be negative, though - in fact, this will happen if the (mean normalized) vectors are anticorrelated.

AFAICT, you should be surprised if you'd get a value larger than 2 or smaller than 0. Using the comment by @Cleb and the fact that the range is [0, 2], I'm guessing that some other packages simply define the distance as half this expression.

@josef-pkt 's answer on github is given below:

It's not a distance correlation which is a nonlinear measure of dependence. e.g. my take http://jpktd.blogspot.ca/2012/06/non-linear-dependence-measures-distance.html However, "correlation" in scipy.spatial.distance.correlation is a bit misleading because according to the formula in the docstring it's a distance measure and not a correlation. perfectly correlated with correlation coefficient equal to 1 has zero distance perfectly negatively correlated with correlation coefficient equal to -1 has maximal distance at 2.

Correlational distance is the inverse of correlation and only looks at the angle/similarity among patterns (sort of like normalization). Correlational distance goes from 0 - 2, with 0 being PERFECT correlation, 1 being no correlation, and 2 being PERFECT ANTICORRELATION. So a small correlational distance value means close together in correlational space (small angular difference). Corr = 1 – dist; Corr dist = 1 – corr; so while a high correlation = high relationship; LOW CORR DISTINANCE = high relationship

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!