Correlation coefficients and p values for all pairs of rows of a matrix

前端 未结 5 1191
北荒
北荒 2020-12-30 03:32

I have a matrix data with m rows and n columns. I used to compute the correlation coefficients between all pairs of rows using np.corrcoef:

5条回答
  •  遥遥无期
    2020-12-30 04:20

    Sort of hackish and possibly inefficient, but I think this could be what you're looking for:

    import scipy.spatial.distance as dist
    
    import scipy.stats as ss
    
    # Pearson's correlation coefficients
    print dist.squareform(dist.pdist(data, lambda x, y: ss.pearsonr(x, y)[0]))    
    
    # p-values
    print dist.squareform(dist.pdist(data, lambda x, y: ss.pearsonr(x, y)[1]))
    

    Scipy's pdist is a very helpful function, which is primarily meant for finding Pairwise distances between observations in n-dimensional space.

    But it allows user defined callable 'distance metrics', which can be exploited to carry out any kind of pair-wise operation. The result is returned in a condensed distance matrix form, which can be easily changed to the square matrix form using Scipy's 'squareform' function.

提交回复
热议问题