Printing ClusterID and its elements using Spark KMeans algo.

问题

I have this program which prints the MSSE of Kmeans algorithm on apache-spark. There are 20 clusters generated. I am trying to print the clusterID and the elements that got assigned to respective clusterID. How do i loop over the clusterID to print the elements.

Thank you guys!!

           val sc = new SparkContext("local", "KMeansExample","/usr/local/spark/", List("target/scala-2.10/kmeans_2.10-1.0.jar"))
            // Load and parse the data
            val data = sc.textFile("kmeans.csv")
         val parsedData = data.map( s => Vectors.dense(s.split(',').map(_.toDouble)))

        // Cluster the data into two classes using KMeans
        val numIterations = 20
        val numClusters = 20
        val clusters = KMeans.train(parsedData, numClusters, numIterations)
        val clusterCenters = clusters.clusterCenters map (_.toArray)
        println("The Cluster Centers are = " + clusterCenters)
        // Evaluate clustering by computing Within Set Sum of Squared Errors
        val WSSSE = clusters.computeCost(parsedData)
        println("Within Set Sum of Squared Errors = " + WSSSE)

回答1:

as I know you should run predict for each elements.

    KMeansModel clusters = KMeans.train(parsedData.rdd(), numClusters, numIterations);

    List<Vector> vectors = parsedData.collect();
    for(Vector vector: vectors){
        System.out.println("cluster "+clusters.predict(vector) +" "+vector.toString());
    }

来源：https://stackoverflow.com/questions/26939281/printing-clusterid-and-its-elements-using-spark-kmeans-algo

标签

apache-spark

k-means

apache-spark-mllib

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!