Getting the coordinates of every observation at each iteration of kmeans in R

问题

I would like to construct an animation of the kmeans clustering algorithm in R. The animation would show each of the observations (rows) in the the dataset plotted in 2 (or 3) dimensions and then have them move into their clusters as each iteration ticks by.

For this I would need to access the coordinates of the observations at each iteration. Where in the kmeans package can I access these?

Thanks,

回答1:

I don't think kmeans() outputs this kind of tracing information. Your best best may be to re-run kmeans() multiple times, carrying over cluster centers.

set.seed(1)
clus.1 <- kmeans(iris[,1:2],5,iter.max=1)
clus.2 <- kmeans(iris[,1:2],centers=clus.1$centers,iter.max=1)
clus.3 <- kmeans(iris[,1:2],centers=clus.2$centers,iter.max=1)

changing <- which(apply(cbind(clus.1$cluster,clus.2$cluster,clus.3$cluster),1,sd)>0)
changing
opar <- par(mfrow=c(1,3))
    plot(iris[,c(1,2)],col=clus.1$cluster,pch=19,main="Iteration 1")
    points(iris[changing,c(1,2)],pch=21,cex=2)
    plot(iris[,c(1,2)],col=clus.2$cluster,pch=19,main="Iteration 2")
    points(iris[changing,c(1,2)],pch=21,cex=2)
    plot(iris[,c(1,2)],col=clus.3$cluster,pch=19,main="Iteration 3")
    points(iris[changing,c(1,2)],pch=21,cex=2)
par(opar)

I indicate the points that do change cluster membership; unfortunately, only one does do so, because kmeans() just converges so darn fast ;-)

You write that you would like to "have them move into their clusters as each iteration ticks by". Of course points don't move in clustering algorithms. So a color-coded representation like this one is your best bet.

In more than two dimensions, you can try pairs(), or just concentrate on two dimensions. Be prepared to explain why n-dimensional clusters don't look "cluster-like" when projected to two dimensions.

回答2:

You can automate the process of getting convergence by using tryCatch as follows

set.seed(1337)
df = iris[,1:2]


dfCluster<-kmeans(df,centers=3, iter.max = 1)
  plot(df[,1], df[,2], col=dfCluster$cluster,pch=19,cex=2, main="iter 1")
  points(dfCluster$centers,col=1:5,pch=3,cex=3,lwd=3)

cent <- list(dfCluster$centers)

max_iter = 10

for (i in 2:max_iter){
  tryCatch({
    dfCluster <- kmeans(df,centers = dfCluster$centers, iter.max = 1)
    done <- TRUE
  }, 
  warning=function(w) {done <- FALSE})
  cent[[i]] <- dfCluster$centers
  if(done) break
}

cent is a list with the centers of the cluster at each iteration

cent
[[1]]
  Sepal.Length Sepal.Width
1     6.795833    3.081250
2     5.769231    2.678846
3     5.006000    3.428000

[[2]]
  Sepal.Length Sepal.Width
1     6.812766    3.074468
2     5.773585    2.692453
3     5.006000    3.428000

To plot this see How to visualize k-means centroids for each iteration?

来源：https://stackoverflow.com/questions/22916337/getting-the-coordinates-of-every-observation-at-each-iteration-of-kmeans-in-r

标签

k-means