Weighted Kmeans R

前端 未结 3 767
离开以前
离开以前 2020-12-18 08:23

I want to do a Kmeans clustering on a dataset (namely, Sample_Data) with three variables (columns) such as below:

     A  B  C
1    12 10 1
2    8  11 2
3            


        
3条回答
  •  慢半拍i
    慢半拍i (楼主)
    2020-12-18 08:50

    You have to use a kmeans weighted clustering, like the one presented in flexclust package:

    https://cran.r-project.org/web/packages/flexclust/flexclust.pdf

    The function

    cclust(x, k, dist = "euclidean", method = "kmeans",
    weights=NULL, control=NULL, group=NULL, simple=FALSE,
    save.data=FALSE)
    

    Perform k-means clustering, hard competitive learning or neural gas on a data matrix. weights An optional vector of weights to be used in the fitting process. Works only in combination with hard competitive learning.

    A toy example using iris data:

    library(flexclust)
    data(iris)
    cl <- cclust(iris[,-5], k=3, save.data=TRUE,weights =c(1,0.5,1,0.1),method="hardcl")
    cl  
        kcca object of family ‘kmeans’ 
    
        call:
        cclust(x = iris[, -5], k = 3, method = "hardcl", weights = c(1, 0.5, 1, 0.1), save.data = TRUE)
    
        cluster sizes:
    
         1  2  3 
        50 59 41 
    

    As you can see from the output of cclust, also using competitive learning the family is always kmenas. The difference is related to cluster assignment during training phase:

    If method is "kmeans", the classic kmeans algorithm as given by MacQueen (1967) is used, which works by repeatedly moving all cluster centers to the mean of their respective Voronoi sets. If "hardcl", on-line updates are used (AKA hard competitive learning), which work by randomly drawing an observation from x and moving the closest center towards that point (e.g., Ripley 1996).

    The weights parameter is just a sequence of numbers, in general I use number between 0.01 (minimum weight) and 1 (maximum weight).

提交回复
热议问题