Matching dendrogram with cluster number in Python's scipy.cluster.hierarchy

耗尽温柔 提交于 2020-08-21 06:29:41

问题


The following code generates a simple hierarchical cluster dendrogram with 10 leaf nodes:

import scipy
import scipy.cluster.hierarchy as sch
import matplotlib.pylab as plt

X = scipy.randn(10,2)
d = sch.distance.pdist(X)
Z= sch.linkage(d,method='complete')
P =sch.dendrogram(Z)
plt.show()

I generate three flat clusters like so:

T = sch.fcluster(Z, 3, 'maxclust')
# array([3, 1, 1, 2, 2, 2, 2, 2, 1, 2])

However, I'd like to see the cluster labels 1,2,3 on the dendrogram. It's easy for me to visualize with just 10 leaf nodes and three clusters, but when I have 1000 nodes and 10 clusters, I can't see what's going on.

How do I show the cluster numbers on the dendrogram? I'm open to other packages. Thanks.


回答1:


Here is a solution that appropriately colors the clusters and labels the leaves of the dendrogram with the appropriate cluster name (leaves are labeled: 'point number, cluster number'). These techniques can be used independently or together. I modified your original example to include both:

import scipy
import scipy.cluster.hierarchy as sch
import matplotlib.pylab as plt

n=10
k=3
X = scipy.randn(n,2)
d = sch.distance.pdist(X)
Z= sch.linkage(d,method='complete')
T = sch.fcluster(Z, k, 'maxclust')

# calculate labels
labels=list('' for i in range(n))
for i in range(n):
    labels[i]=str(i)+ ',' + str(T[i])

# calculate color threshold
ct=Z[-(k-1),2]  

#plot
P =sch.dendrogram(Z,labels=labels,color_threshold=ct)
plt.show()


来源:https://stackoverflow.com/questions/19778167/matching-dendrogram-with-cluster-number-in-pythons-scipy-cluster-hierarchy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!