Cluster analysis in R: determine the optimal number of clusters

后端未结

关注

 7  2003

星月不相逢 2020-11-22 10:28

Being a newbie in R, I\'m not very sure how to choose the best number of clusters to do a k-means analysis. After plotting a subset of below data, how many clusters will be

7条回答

醉梦人生 (楼主)

2020-11-22 10:42

These methods are great but when trying to find k for much larger data sets, these can be crazy slow in R.

A good solution I have found is the "RWeka" package, which has an efficient implementation of the X-Means algorithm - an extended version of K-Means that scales better and will determine the optimum number of clusters for you.

First you'll want to make sure that Weka is installed on your system and have XMeans installed through Weka's package manager tool.

library(RWeka)

# Print a list of available options for the X-Means algorithm
WOW("XMeans")

# Create a Weka_control object which will specify our parameters
weka_ctrl <- Weka_control(
    I = 1000,                          # max no. of overall iterations
    M = 1000,                          # max no. of iterations in the kMeans loop
    L = 20,                            # min no. of clusters
    H = 150,                           # max no. of clusters
    D = "weka.core.EuclideanDistance", # distance metric Euclidean
    C = 0.4,                           # cutoff factor ???
    S = 12                             # random number seed (for reproducibility)
)

# Run the algorithm on your data, d
x_means <- XMeans(d, control = weka_ctrl)

# Assign cluster IDs to original data set
d$xmeans.cluster <- x_means$class_ids

0 讨论(0)

查看其它7个回答