In Python, how can I calculate correlation and statistical significance between two arrays of data?

怎甘沉沦 提交于 2019-12-03 12:24:01

If you want to calculate the Pearson Correlation Coefficient, then scipy.stats.pearsonr is the way to go; although, the significance is only meaningful for larger data sets. This function does not require the data to be manipulated to fall into a specified range. The value for the correlation falls in the interval [-1,1], perhaps that was the confusion?

If the significance is not terribly important, you can use numpy.corrcoef().

The Mahalanobis distance does take into account the correlation between two arrays, but it provides a distance measure, not a correlation. (Mathematically, the Mahalanobis distance is not a true distance function; nevertheless, it can be used as such in certain contexts to great advantage.)

Oriol Nieto

You can use the Mahalanobis distance between these two arrays, which takes into account the correlation between them.

The function is in the scipy package: scipy.spatial.distance.mahalanobis

There's a nice example here

scipy.spatial.distance.euclidean()

This gives euclidean distance between 2 points, 2 np arrays, 2 lists, etc

import scipy.spatial.distance as spsd
spsd.euclidean(nparray1, nparray2)

You can find more info here http://docs.scipy.org/doc/scipy/reference/spatial.distance.html

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!