Why is the line of wss-plot (for optimizing the cluster analysis) looks so fluctuated?

↘锁芯ラ 提交于 2019-12-08 12:35:04

问题


I have a cluster plot by R while I want to optimize the "elbow criterion" of clustering with a wss plot, so I drew a wss plot for my cluster, but is looks really strange and I do not know how many elbows should I cluster, anyone could help me?

Here is my data:

Friendly<-c(0.533,0.854,0.9585,0.925,0.9125,0.9815,0.9645,0.981,0.9935,0.9585,0.996,0.956,0.9415)
Polite<-c(0,0.45,0.977,0.9915,0.929,0.981,0.9895,0.9875,1,0.96,0.996,0.873,0.9125)
Praising<-c(0,0,0.437,0.9585,0.9415,0.9605,0.998,0.998,0.8915,1,1,1,0.977)
Joking<-c(0,0,0,0.617,0.942,0.9665,0.9935,0.992,0.935,0.987,0.975,0.9915,0.9665)
Sincere<-c(0,0,0,0,0.617,0.8335,0.985,0.9895,0.977,0.9205,1,0.9585,0.8895)
Serious<-c(0,0,0,0,1,0.642,0.975,0.9605,0.9645,0.9895,0.8125,0.9605,0.925)
Hostile<-c(0,0,0,0,0,0,0.629,0.656,0.948,0.9705,0.9645,0.998,0.9685)
Rude<-c(0,0,0,0,0,0,0,0.687,0.979,0.954,0.954,0.996,0.956)
Irony<-c(0,0,0,0,0,0,0,0,0.354,0.9815,0.996,1,0.971)
Insincere<-c(0,0,0,0,0,0,0,0,1,0.396,0.996,0.9915,0.9415)
Commanding<-c(0,0,0,0,0,0,0,0,0,1,0.462,0.9605,0.9165)
Suggesting<-c(0,0,0,0,0,0,0,0,0,0,0,0.867,0.775)
Neutral<-c(0,0,0,0,0,0,0,0,0,0,0,0,0.283)

data <- data.frame(Friendly,Polite,Praising,Joking,Sincere,Serious,Hostile,Rude,Blaming,Insincere,Commanding,Suggesting,Neutral)

And here is my code of clustering: the method is given by Gavin in the last line of :How to draw the plot of within-cluster sum-of-squares for a cluster?

##cluster analysis
dist<-as.dist(data)
hc<-hclust(dist, method="average")
plot(hc, main="", sub='Method="Average"', ann=T, axes=T, hang=0.2)
##draw a wss plot
res <- sapply(seq.int(1, 13), wrap, h = hc, x = data) 
plot(seq_along(res), res, type="b", pch=19)

But it looks like this, anyone can explain why this happened and how to decide the "elbow criterion"?


回答1:


Why do you expect that WSS will decline smoothly with increasing numbers of clusters? It need not, as you found out. Only with well-behaved data have I seen nicely behaved scree plots.

There is a big drop in the WSS with 7 clusters which might suggest you want to stop there. However, you should also look at the dendrogram when you evaluate this.



来源:https://stackoverflow.com/questions/25977798/why-is-the-line-of-wss-plot-for-optimizing-the-cluster-analysis-looks-so-fluct

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!