问题
I'm trying to find distance correlation between columns, look at the code below. Most of time it returns higher than 1 result, which is not possible, because distance correlation is between 0 and 1. You can read about scipy's distance correlation here.
import numpy as np
from scipy.spatial import distance
x = np.random.uniform(-1, 1, 10000)
print distance.correlation(x, x**2)
1.00210811815
What is wrong here or how can I measure it?
upd1: Link to issue on github
回答1:
I don't see why this is a problem according to the documentation.
From the documentation:
The correlation distance between u and v, is defined as 1 - \frac{(u - \bar{u}) \cdot (v - \bar{v})} {{||(u - \bar{u})||}_2 {||(v - \bar{v})||}_2}
By the Cauchy-Schwarz Inequality, the expression following the minus sign has an absolute value that is at most 1. There is nothing stipulating that it won't be negative, though - in fact, this will happen if the (mean normalized) vectors are anticorrelated.
AFAICT, you should be surprised if you'd get a value larger than 2 or smaller than 0. Using the comment by @Cleb and the fact that the range is [0, 2], I'm guessing that some other packages simply define the distance as half this expression.
回答2:
@josef-pkt 's answer on github is given below:
It's not a distance correlation which is a nonlinear measure of dependence. e.g. my take http://jpktd.blogspot.ca/2012/06/non-linear-dependence-measures-distance.html However, "correlation" in scipy.spatial.distance.correlation is a bit misleading because according to the formula in the docstring it's a distance measure and not a correlation. perfectly correlated with correlation coefficient equal to 1 has zero distance perfectly negatively correlated with correlation coefficient equal to -1 has maximal distance at 2.
回答3:
Correlational distance is the inverse of correlation and only looks at the angle/similarity among patterns (sort of like normalization). Correlational distance goes from 0 - 2, with 0 being PERFECT correlation, 1 being no correlation, and 2 being PERFECT ANTICORRELATION. So a small correlational distance value means close together in correlational space (small angular difference). Corr = 1 – dist; Corr dist = 1 – corr; so while a high correlation = high relationship; LOW CORR DISTINANCE = high relationship
来源:https://stackoverflow.com/questions/35988933/scipy-distance-correlation-is-higher-than-1