Scikit Learn - K-Means - Elbow - criterion

前端 未结 3 1508
耶瑟儿~
耶瑟儿~ 2021-01-30 02:40

Today i\'m trying to learn something about K-means. I Have understand the algorithm and i know how it works. Now i\'m looking for the right k... I found the elbow criterion as a

3条回答
  •  误落风尘
    2021-01-30 02:58

    This answer is inspired by what OmPrakash has written. This contains code to plot both the SSE and Silhouette Score. What I've given is a general code snippet you can follow through in all cases of unsupervised learning where you don't have the labels and want to know what's the optimal number of cluster. There are 2 criterion. 1) Sum of Square errors (SSE) and Silhouette Score. You can follow OmPrakash's answer for the explanation. He's done a good job at that.

    Assume your dataset is a data frame df1. Here I have used a different dataset just to show how we can use both the criterion to help decide optimal number of cluster. Here I think 6 is the correct number of cluster. Then

    range_n_clusters = [2, 3, 4, 5, 6,7,8]
    elbow = []
    ss = []
    for n_clusters in range_n_clusters:
       #iterating through cluster sizes
       clusterer = KMeans(n_clusters = n_clusters, random_state=42)
       cluster_labels = clusterer.fit_predict(df1)
       #Finding the average silhouette score
       silhouette_avg = silhouette_score(df1, cluster_labels)
       ss.append(silhouette_avg)
       print("For n_clusters =", n_clusters,"The average silhouette_score is :", silhouette_avg)`
       #Finding the average SSE"
       elbow.append(clusterer.inertia_) # Inertia: Sum of distances of samples to their closest cluster center
    fig = plt.figure(figsize=(14,7))
    fig.add_subplot(121)
    plt.plot(range_n_clusters, elbow,'b-',label='Sum of squared error')
    plt.xlabel("Number of cluster")
    plt.ylabel("SSE")
    plt.legend()
    fig.add_subplot(122)
    plt.plot(range_n_clusters, ss,'b-',label='Silhouette Score')
    plt.xlabel("Number of cluster")
    plt.ylabel("Silhouette Score")
    plt.legend()
    plt.show()
    

提交回复
热议问题