Calculating correlations between every item in a list

自古美人都是妖i 提交于 2019-12-12 16:02:27

问题


I'm trying to calculate the Pearson correlation correlation between every item in my list. I'm trying to get the correlations between data[0] and data[1], data[0] and data[2], and data[1] and data[2].

import scipy
from scipy import stats

data = [[1, 2, 4], [9, 5, 1], [8, 3, 3]]

def pearson(x, y):
    series1 = data[x]
    series2 = data[y]
    if x != y:
        return scipy.stats.pearsonr(series1, series2)

h = [pearson(x,y) for x,y in range(0, len(data))]

This returns the error TypeError: 'int' object is not iterable on h. Could someone please explain the error here? Thanks.


回答1:


range will return you a list of int values while you are trying to use it like it returning you a tuple. Try itertools.combinations instead:

import scipy
from scipy import stats
from itertools import combinations

data = [[1, 2, 4], [9, 5, 1], [8, 3, 3]]

def pearson(x, y):
    series1 = data[x]
    series2 = data[y]
    if x != y:
        return scipy.stats.pearsonr(series1, series2)

h = [pearson(x,y) for x,y in combinations(len(data), 2)]

Or as @Marius suggested:

h = [stats.pearsonr(data[x], data[y]) for x,y in combinations(len(data), 2)]



回答2:


Why not use numpy.corrcoef

import numpy as np
data = [[1, 2, 4], [9, 5, 1], [8, 3, 3]]  

Result:

>>> np.corrcoef(data)
array([[ 1.        , -0.98198051, -0.75592895],
       [-0.98198051,  1.        ,  0.8660254 ],
       [-0.75592895,  0.8660254 ,  1.        ]])



回答3:


The range() function will give you only an int for each iteration, and you can't assign an int to a pair of values.

If you want to go through every possible pair of possibilities of ints in that range you could try

import itertools

h = [pearson(x,y) for x,y in itertools.product(range(len(data)), repeat=2)]

That will combine all the possibilities in the given range in a tuple of 2 elements

Remember that, using that function you defined, when x==y you will have None values. To fix that you could use:

import itertools

h = [pearson(x,y) for x,y in itertools.permutations(range(len(data)), 2)]


来源:https://stackoverflow.com/questions/13262654/calculating-correlations-between-every-item-in-a-list

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!