可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Let's say I'm examining up to 10 clusters, with scipy I usually generate the 'elbow' plot as follows:
from scipy import cluster cluster_array = [cluster.vq.kmeans(my_matrix, i) for i in range(1,10)] pyplot.plot([var for (cent,var) in cluster_array]) pyplot.show()
I have since became motivated to use sklearn for clustering, however I'm not sure how to create the array needed to plot as in the scipy case. My best guess was:
from sklearn.cluster import KMeans km = [KMeans(n_clusters=i) for i range(1,10)] cluster_array = [km[i].fit(my_matrix)]
That unfortunately resulted in an invalid command error. What is the best way sklearn way to go about this?
Thank you
回答1:
You had some syntax problems in the code. They should be fixed now:
Ks = range(1, 10) km = [KMeans(n_clusters=i) for i in Ks] score = [km[i].fit(my_matrix).score(my_matrix) for i in range(len(km))]
The fit method just returns a self object. In this line in the original code
cluster_array = [km[i].fit(my_matrix)]
the cluster_array would end up having the same contents as km.
You can use the score method to get the estimate for how well the clustering fits. To see the score for each cluster simply run plot(Ks, score).
回答2:
you can use the inertia attribute of Kmeans class.
Assuming X is your dataset:
from sklearn.cluster import KMeans from matplotlib import pyplot as plt X = # <your_data> distorsions = [] for k in range(2, 20): kmeans = KMeans(n_clusters=k) kmeans.fit(X) distorsions.append(kmeans.inertia_) fig = plt.figure(figsize=(15, 5)) plt.plot(range(2, 20), distorsions) plt.grid(True) plt.title('Elbow curve')