How to draw the plot of within-cluster sum-of-squares for a cluster?

前端未结

关注

 1  1229

死守一世寂寞 2020-12-28 11:13

I have a cluster plot by R while I want to optimize the \"elbow criterion\" of clustering with a wss plot, but I do not know how to draw a wss plot for a giving cluster, any

1条回答

無奈伤痛 (楼主)

2020-12-28 11:58
If I follow what you want, then we need a function to compute WSS
```
wss <- function(d) {
  sum(scale(d, scale = FALSE)^2)
}
```
and a wrapper for this wss() function
```
wrap <- function(i, hc, x) {
  cl <- cutree(hc, i)
  spl <- split(x, cl)
  wss <- sum(sapply(spl, wss))
  wss
}
```
This wrapper takes the following arguments, inputs:
- i the number of clusters to cut the data into
- hc the hierarchical cluster analysis object
- x the original data
wrap then cuts the dendrogram into i clusters, splits the original data into the cluster membership given by cl and computes the WSS for each cluster. These WSS values are summed to give the WSS for that clustering.

We run all of this using sapply over the number of clusters 1, 2, ..., nrow(data)
```
res <- sapply(seq.int(1, nrow(data)), wrap, h = cl, x = data)
```
A screeplot can be drawn using
```
plot(seq_along(res), res, type = "b", pch = 19)
```
Here is an example using the well-known Edgar Anderson Iris data set:
```
iris2 <- iris[, 1:4]  # drop Species column
cl <- hclust(dist(iris2), method = "ward.D")

## Takes a little while as we evaluate all implied clustering up to 150 groups
res <- sapply(seq.int(1, nrow(iris2)), wrap, h = cl, x = iris2)
plot(seq_along(res), res, type = "b", pch = 19)
```
This gives:

We can zoom in by just showing the first 1:50 clusters
```
plot(seq_along(res[1:50]), res[1:50], type = "o", pch = 19)
```
which gives

You can speed up the main computation step by either running the sapply() via an appropriate parallelised alternative, or just do the computation for a fewer than nrow(data) clusters, e.g.
```
res <- sapply(seq.int(1, 50), wrap, h = cl, x = iris2) ## 1st 50 groups
```
0 讨论(0)
发布评论:

提交评论
- 加载中...