Unsupervised clustering with unknown number of clusters

后端 未结 6 587
攒了一身酷
攒了一身酷 2020-11-28 19:18

I have a large set of vectors in 3 dimensions. I need to cluster these based on Euclidean distance such that all the vectors in any particular cluster have a Euclidean dista

6条回答
  •  爱一瞬间的悲伤
    2020-11-28 19:50

    I want to add to moooeeeep's answer by using hierarchical clustering. This solution work for me, though it quite "random" to pick threshold value. By referrence to other source and test by myself, I got better method and threshold could be easily picked by dendrogram:

    from scipy.cluster import hierarchy
    from scipy.spatial.distance import pdist
    import matplotlib.pyplot as plt
    
    ori_array = ["Your_list_here"]
    ward_array = hierarchy.ward(pdist(ori_array))
    dendrogram = hierarchy.dendrogram(hierarchy.linkage(ori_array, method  = "ward"))
    plt.title('Dendrogram')
    plt.xlabel('Customers')
    plt.ylabel('Euclidean distances')
    plt.show()
    

    You will see the plot like this click here. Then by drawing the horizontal line, let say at distance = 1, the number of conjunctions will be your desire number of clusters. So here I choose threshold = 1 for 4 clusters.

    threshold = 1
    clusters_list = hierarchy.fcluster(ward_array, threshold, criterion="distance")
    print("Clustering list: {}".format(clusters_list))
    

    Now each value in cluster_list will be an assigned cluster-id of the corresponding point in ori_array.

提交回复
热议问题