how to set Spark Kmeans initial centers

问题

I'm using Spark ML for run Kmeans. I have bunch of data and three existing centers, for example the three centers are:[1.0,1.0,1.0],[5.0,5.0,5.0],[9.0,9.0,9.0]. So how can I indicate the Kmeans centers are the above three vectors. I saw Kmean object has seed parameter, but the seed parameter is an long type not an array. So how can I tell Spark Kmeans to only use the existing centers for clustering.

Or say, I didn't understand what does seed mean in Spark Kmeans, I suppose the seeds should be an array of vectors which represents the specified centers before running clustering.

回答1:

Indeed, seed does not mean what you think, i.e. it is not used for 'seeding' (initializing) the cluster centers, but simply for setting the random seed - you can confirm this in the documentation for the Scala and Python APIs.

To the best of my knowledge, there is currently (Spark 2.1) no way for supplying initial cluster centers for k-means in Spark ML (see this answer for Spark MLlib). The initMode parameter, according to the documentation:

can be either "random" to choose random points as initial cluster centers, or "k-means||" to use a parallel variant of k-means++')

来源：https://stackoverflow.com/questions/43483011/how-to-set-spark-kmeans-initial-centers

标签

apache-spark

machine-learning

cluster-analysis

k-means

apache-spark-mllib

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!