Plotting clusters using k-means with distance from centroid

[亡魂溺海] 提交于 2019-12-11 03:13:40

问题


I am trying to create a plot similar to this:

Here there are three clusters and all the datapoints (circles) are plotted according to their euclidean distance from the centroid. Using this image its easy to see that 5 samples from class 2 ended up in wrong clusters.

I'm running k-means using kmeans and can't figure out how to plot this type of graph.

For example purposes we can use the iris dataset.

> iri <- iris
> cl <- kmeans (iri[, 1:4], 3)
> cl
K-means clustering with 3 clusters of sizes 38, 62, 50

Cluster means:
  Sepal.Length Sepal.Width Petal.Length Petal.Width
1     6.850000    3.073684     5.742105    2.071053
2     5.901613    2.748387     4.393548    1.433871
3     5.006000    3.428000     1.462000    0.246000

Clustering vector:
  [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [40] 3 3 3 3 3 3 3 3 3 3 3 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1
 [79] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 1 1 1 1 2 1 1 1 1 1 1 2 2 1 1
[118] 1 1 2 1 2 1 2 1 1 2 2 1 1 1 1 1 2 1 1 1 1 2 1 1 1 2 1 1 1 2 1 1 2

Within cluster sum of squares by cluster:
[1] 23.87947 39.82097 15.15100
 (between_SS / total_SS =  88.4 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
[6] "betweenss"    "size"         "iter"         "ifault"      

Source for the image: https://github.com/michaelwsherman/winecluster The originator doesn't seem to be using kmeans though.

I suspect this might not be doable using kmeans for it does not provide distance from centroid. Is there another way to show data in this way or something similar?

来源:https://stackoverflow.com/questions/29397974/plotting-clusters-using-k-means-with-distance-from-centroid

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!