How can i find the mean distance from the centroid to all the data points in each cluster. I am able to find the euclidean distance of each point (in my dataset) from the ce
Here's one way. You can substitute another distance measure in the function for k_mean_distance()
if you want another distance metric other than Euclidean.
Calculate distance between data points for each assigned cluster and cluster centers and return the mean value.
Function for distance calculation:
def k_mean_distance(data, cx, cy, i_centroid, cluster_labels):
# Calculate Euclidean distance for each data point assigned to centroid
distances = [np.sqrt((x-cx)**2+(y-cy)**2) for (x, y) in data[cluster_labels == i_centroid]]
# return the mean value
return np.mean(distances)
And for each centroid, use the function to get the mean distance:
total_distance = []
for i, (cx, cy) in enumerate(centroids):
# Function from above
mean_distance = k_mean_distance(data, cx, cy, i, cluster_labels)
total_dist.append(mean_distance)
So, in the context of your question:
def k_mean_distance(data, cx, cy, i_centroid, cluster_labels):
distances = [np.sqrt((x-cx)**2+(y-cy)**2) for (x, y) in data[cluster_labels == i_centroid]]
return np.mean(distances)
t_data=PCA(n_components=2).fit_transform(array_convt)
k_means=KMeans()
clusters=k_means.fit_predict(t_data)
centroids = km.cluster_centers_
c_mean_distances = []
for i, (cx, cy) in enumerate(centroids):
mean_distance = k_mean_distance(t_data, cx, cy, i, clusters)
c_mean_distances.append(mean_distance)
If you plot the results plt.plot(c_mean_distances)
you should see something like this: