Should we used k-means++ instead of k-means?

前端 未结 2 1016
北恋
北恋 2021-02-08 14:18

The k-means++ algorithm helps in two following points of the original k-means algorithm:

  1. The original k-means algorithm has the worst case running time of super-po
2条回答
  •  萌比男神i
    2021-02-08 15:13

    Not your question, but an easy speedup to any kmeans method for large N:

    1) first do k-means on a random sample of say sqrt(N) of the points
    2) then run full k-means from those centres.

    I've found this 5-10 times faster than kmeans++ for N 10000, k 20, with similar results.
    How well it works for you will depend on how well a sqrt(N) sample approximates the whole, as well as on N, dim, k, ninit, delta ...

    What are your N (number of data points), dim (number of features), and k ?
    The huge range in users' N, dim, k, data noise, metrics ... not to mention the lack of public benchmarks, make it tough to compare methods.

    Added: Python code for kmeans() and kmeanssample() is here on SO; comments are welcome.

提交回复
热议问题