compute clustersize automatically for kmeans

独自空忆成欢 提交于 2019-12-12 10:19:35

问题


I am using scikit-learn and experimenting Kmeans. Its fast but requires number of clusters as an argument. What i would like to try is to automatically computer number of clusters for based on population of documents.

hash-based near-neighbor algorithms (ssdeep) i used before can get similarity clusters based on distance , how can i get cluster size automatically for k means .

KMeans(init='k-means++', n_clusters=cluster_count, n_init=10),
          name="k-means++", data=data)

I want to calculate that cluster_count automatically , is that possible? my test dataset is collection of random files from 20_newsgroup , not pre-categorize into folder , single folder , so no labels.

来源:https://stackoverflow.com/questions/13684041/compute-clustersize-automatically-for-kmeans

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!