How to initialize cluster centers for K-means in Spark MLlib?

后端 未结 1 1291
北荒
北荒 2021-01-05 20:52

Is there a way to initialize cluster centers while running K-Means in Spark MLlib?

I tried following :

model = KMeans.train(
    sc.parallelize(data)         


        
相关标签:
1条回答
  • 2021-01-05 21:41

    Initial model can set in Scala since Spark 1.5+ using setInitialModel which takes KMeansModel:

    import org.apache.spark.mllib.clustering.{KMeans, KMeansModel}
    import org.apache.spark.mllib.linalg.Vectors
    
    val data = sc.parallelize(Seq(
        "[0.0, 0.0]", "[1.0, 1.0]", "[9.0, 8.0]", "[8.0,  9.0]"
    )).map(Vectors.parse(_))
    
    val initialModel = new KMeansModel(
       Array("[0.6,  0.6]", "[8.0,  8.0]").map(Vectors.parse(_))
    )
    
    val model = new KMeans()
      .setInitialModel(initialModel)
      .setK(2)
      .run(data)
    

    and PySpark 1.6+ using initialModel parameter to train method:

    from pyspark.mllib.clustering import KMeansModel, KMeans
    from pyspark.mllib.linalg import Vectors
    
    data = sc.parallelize([
        "[0.0, 0.0]", "[1.0, 1.0]", "[9.0, 8.0]", "[8.0,  9.0]"
    ]).map(Vectors.parse)
    
    initialModel = KMeansModel([
        Vectors.parse(v) for v in ["[0.6,  0.6]", "[8.0,  8.0]"]])
    model = KMeans.train(data, 2, initialModel=initialModel)
    

    If any of these methods doesn't work it means that you're using an earlier version of Spark.

    0 讨论(0)
提交回复
热议问题