Correlation coefficients and p values for all pairs of rows of a matrix

前端 未结 5 1187
北荒
北荒 2020-12-30 03:32

I have a matrix data with m rows and n columns. I used to compute the correlation coefficients between all pairs of rows using np.corrcoef:

5条回答
  •  Happy的楠姐
    2020-12-30 04:07

    The most consice way of doing it might be the buildin method .corr in pandas, to get r:

    In [79]:
    
    import pandas as pd
    m=np.random.random((6,6))
    df=pd.DataFrame(m)
    print df.corr()
              0         1         2         3         4         5
    0  1.000000 -0.282780  0.455210 -0.377936 -0.850840  0.190545
    1 -0.282780  1.000000 -0.747979 -0.461637  0.270770  0.008815
    2  0.455210 -0.747979  1.000000 -0.137078 -0.683991  0.557390
    3 -0.377936 -0.461637 -0.137078  1.000000  0.511070 -0.801614
    4 -0.850840  0.270770 -0.683991  0.511070  1.000000 -0.499247
    5  0.190545  0.008815  0.557390 -0.801614 -0.499247  1.000000
    

    To get p values using t-test:

    In [84]:
    
    n=6
    r=df.corr()
    t=r*np.sqrt((n-2)/(1-r*r))
    
    import scipy.stats as ss
    ss.t.cdf(t, n-2)
    Out[84]:
    array([[ 1.        ,  0.2935682 ,  0.817826  ,  0.23004382,  0.01585695,
             0.64117917],
           [ 0.2935682 ,  1.        ,  0.04363408,  0.17836685,  0.69811422,
             0.50661121],
           [ 0.817826  ,  0.04363408,  1.        ,  0.39783538,  0.06700715,
             0.8747497 ],
           [ 0.23004382,  0.17836685,  0.39783538,  1.        ,  0.84993082,
             0.02756579],
           [ 0.01585695,  0.69811422,  0.06700715,  0.84993082,  1.        ,
             0.15667393],
           [ 0.64117917,  0.50661121,  0.8747497 ,  0.02756579,  0.15667393,
             1.        ]])
    In [85]:
    
    ss.pearsonr(m[:,0], m[:,1])
    Out[85]:
    (-0.28277983892175751, 0.58713640696703184)
    In [86]:
    #be careful about the difference of 1-tail test and 2-tail test:
    0.58713640696703184/2
    Out[86]:
    0.2935682034835159 #the value in ss.t.cdf(t, n-2) [0,1] cell
    

    Also you can just use the scipy.stats.pearsonr you mentioned in OP:

    In [95]:
    #returns a list of tuples of (r, p, index1, index2)
    import itertools
    [ss.pearsonr(m[:,i],m[:,j])+(i, j) for i, j in itertools.product(range(n), range(n))]
    Out[95]:
    [(1.0, 0.0, 0, 0),
     (-0.28277983892175751, 0.58713640696703184, 0, 1),
     (0.45521036266021014, 0.36434799921123057, 0, 2),
     (-0.3779357902414715, 0.46008763115463419, 0, 3),
     (-0.85083961671703368, 0.031713908656676448, 0, 4),
     (0.19054495489542525, 0.71764166168348287, 0, 5),
     (-0.28277983892175751, 0.58713640696703184, 1, 0),
     (1.0, 0.0, 1, 1),
    #etc, etc
    

提交回复
热议问题