What is the time complexity of k-means?

回眸只為那壹抹淺笑 提交于 2019-12-03 16:58:16

问题


I was going through the k-means Wikipedia page. Based on the algorithm, I think the complexity is O(n*k*i) (n = total elements, k = number of cluster iteration)

So can someone explain me this statement from Wikipedia and how is this NP hard?

If k and d (the dimension) are fixed, the problem can be exactly solved in time O(ndk+1 log n), where n is the number of entities to be clustered.


回答1:


It depends on what you call k-means.

The problem of finding the global optimum of the k-means objective function

is NP-hard, where Si is the cluster i (and there are k clusters), xj is the d-dimensional point in cluster Si and μi is the centroid (average of the points) of cluster Si.

However, running a fixed number t of iterations of the standard algorithm takes only O(t*k*n*d), for n (d-dimensional) points, where kis the number of centroids (or clusters). This what practical implementations do (often with random restarts between the iterations).

The standard algorithm only approximates a local optimum of the above function, and so do all the k-means algorithms that I've seen.




回答2:


In this answer, note that i used in the k-means objective formula and i used in the analysis of the time complexity of k-means (that is, the number of iterations needed until convergence) are different.




回答3:


The problem is NP-Hard because there is another well known NP hard problem that can be reduced to (planar) k-means problem. Have a look at the paper The Planar k-means Problem is NP-hard (by Mahajan et al.) for more info.



来源:https://stackoverflow.com/questions/18634149/what-is-the-time-complexity-of-k-means

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!