问题
How can I run hierarchical clustering on a correlation matrix in scipy/numpy? I have a matrix of 100 rows by 9 columns, and I'd like to hierarchically cluster by correlations of each entry across the 9 conditions. I'd like to use 1-pearson correlation as the distances for clustering. Assuming I have a numpy array X that contains the 100 x 9 matrix, how can I do this?
I tried using hcluster, based on this example:
Y=pdist(X, 'seuclidean')
Z=linkage(Y, 'single')
dendrogram(Z, color_threshold=0)
However, pdist is not what I want, since that's a euclidean distance. Any ideas?
thanks.
回答1:
Just change the metric to correlation so that the first line becomes:
Y=pdist(X, 'correlation')
However, I believe that the code can be simplified to just:
Z=linkage(X, 'single', 'correlation')
dendrogram(Z, color_threshold=0)
because linkage will take care of the pdist for you.
来源:https://stackoverflow.com/questions/2907919/hierarchical-clustering-on-correlations-in-python-scipy-numpy