发表新帖

发表新帖

Should we used k-means++ instead of k-means?

前端未结

关注

 2  1016

北恋 2021-02-08 14:18

The k-means++ algorithm helps in two following points of the original k-means algorithm:

The original k-means algorithm has the worst case running time of super-po

2条回答

萌比男神i (楼主)

2021-02-08 15:13

Not your question, but an easy speedup to any kmeans method for large N:

1) first do k-means on a random sample of say sqrt(N) of the points
2) then run full k-means from those centres.

I've found this 5-10 times faster than kmeans++ for N 10000, k 20, with similar results.
How well it works for you will depend on how well a sqrt(N) sample approximates the whole, as well as on N, dim, k, ninit, delta ...

What are your N (number of data points), dim (number of features), and k ?
The huge range in users' N, dim, k, data noise, metrics ... not to mention the lack of public benchmarks, make it tough to compare methods.

Added: Python code for kmeans() and kmeanssample() is here on SO; comments are welcome.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题