Cluster analysis in R: determine the optimal number of clusters

后端 未结 7 2003
星月不相逢
星月不相逢 2020-11-22 10:28

Being a newbie in R, I\'m not very sure how to choose the best number of clusters to do a k-means analysis. After plotting a subset of below data, how many clusters will be

7条回答
  •  醉梦人生
    2020-11-22 10:42

    These methods are great but when trying to find k for much larger data sets, these can be crazy slow in R.

    A good solution I have found is the "RWeka" package, which has an efficient implementation of the X-Means algorithm - an extended version of K-Means that scales better and will determine the optimum number of clusters for you.

    First you'll want to make sure that Weka is installed on your system and have XMeans installed through Weka's package manager tool.

    library(RWeka)
    
    # Print a list of available options for the X-Means algorithm
    WOW("XMeans")
    
    # Create a Weka_control object which will specify our parameters
    weka_ctrl <- Weka_control(
        I = 1000,                          # max no. of overall iterations
        M = 1000,                          # max no. of iterations in the kMeans loop
        L = 20,                            # min no. of clusters
        H = 150,                           # max no. of clusters
        D = "weka.core.EuclideanDistance", # distance metric Euclidean
        C = 0.4,                           # cutoff factor ???
        S = 12                             # random number seed (for reproducibility)
    )
    
    # Run the algorithm on your data, d
    x_means <- XMeans(d, control = weka_ctrl)
    
    # Assign cluster IDs to original data set
    d$xmeans.cluster <- x_means$class_ids
    

提交回复
热议问题