Trouble with scipy kmeans and kmeans2 clustering in Python

混江龙づ霸主 提交于 2019-12-05 12:33:20

Thank you for the good question with the sample code and images! This is a good newbie question.

Most of the peculiarities can be solved by careful reading of the docs. A few things:

  • When comparing the original set of points and the resulting cluster centers, you should try and plot them in the same plot with the same dimensions (i.e., w agains the results). For example, plot the cluster centers with the large dots as you've done and original data with small dots on top of it.

  • kmeans and kmeans2 start from different situation. kmeans2 starts from random distribution of points, and as your data is not evenly distributed, kmeans2 converges into a non-ideal result. You might try to add keyword minit='points' and see if the results change.

  • As the initial centroid choice is a bad one, only 17 of the initial 100 centroids actually have any points belonging to them (this is closely related to the random look of the graph).

  • It seems that some centroids in kmeans may collapse into each other if that gives the smallest distortion. (This does not seem tp be documented.) Thus you will get only 96 centroids.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!