Clustering with scipy - clusters via distance matrix, how to get back the original objects

為{幸葍}努か 提交于 2019-12-05 06:31:29

First off, you don't need to go through the entire process with cdist and linkage if you use fclusterdata instead of fcluster; that function you can feed an (n_documents, n_features) array of term counts, tf-idf values, or whatever your features are.

The output from fclusterdata is the same as that of fcluster: an array T such that "T[i] is the flat cluster number to which original observation i belongs." I.e., the cluster.hierarchy module flattens the clustering according to a threshold which you set at 0.5*distances.max(). In your case, the third and fifth document are clustered together, but all the others form clusters of their own, so you might want to set the threshold higher or using a different criterion.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!