I have to perform a cluster analysis on a big amount of data. Since I have a lot of missing values I made a correlation matrix.
corloads = cor(df1[,2:185],
To determine the "optimal number of clusters" several methods are available, despite it is a controversy theme.
The kgs is helpful to get the optimal number of clusters.
Following your code one would do:
clus <- hclust(distance)
op_k <- kgs(clus, distance, maxclus = 20)
plot (names (op_k), op_k, xlab="# clusters", ylab="penalty")
So the optimal number of clusters according to the kgs function is the minimum value of op_k, as you can see in the plot.
You can get it with
min(op_k)
Note that I set the maximum number of clusters allowed to 20. You can set this argument to NULL.
Check this page for more methods.
Hope it helps you.
To find which is the optimal number of clusters, you can do
op_k[which(op_k == min(op_k))]
Also see this post to find the perfect graphy answer from @Ben
op_k[which(op_k == min(op_k))]
still gives penalty. To find the optimal number of clusters, use
as.integer(names(op_k[which(op_k == min(op_k))]))