Unsupervised clustering with unknown number of clusters

后端 未结 6 591
攒了一身酷
攒了一身酷 2020-11-28 19:18

I have a large set of vectors in 3 dimensions. I need to cluster these based on Euclidean distance such that all the vectors in any particular cluster have a Euclidean dista

6条回答
  •  悲&欢浪女
    2020-11-28 19:44

    The answer by moooeeeep recommended using hierarchical clustering. I wanted to elaborate on how to choose the treshold of the clustering.

    One way is to compute clusterings based on different thresholds t1, t2, t3,... and then compute a metric for the "quality" of the clustering. The premise is that the quality of a clustering with the optimal number of clusters will have the maximum value of the quality metric.

    An example of a good quality metric I've used in the past is Calinski-Harabasz. Briefly: you compute the average inter-cluster distances and divide them by the within-cluster distances. The optimal clustering assignment will have clusters that are separated from each other the most, and clusters that are "tightest".

    By the way, you don't have to use hierarchical clustering. You can also use something like k-means, precompute it for each k, and then pick the k that has the highest Calinski-Harabasz score.

    Let me know if you need more references, and I'll scour my hard disk for some papers.

提交回复
热议问题