How to select which columns are good for visualisation in k-Means clustering algorithm?

吃可爱长大的小学妹 提交于 2019-12-20 06:42:07

问题


I am trying to understand the selection of columns in csv file which should be taken into considerations to apply k-means . In the below link only annual income and spending score is taken as a column (from Mall_Customers.csv file) for visualisation and not age. https://www.kaggle.com/shrutimechlearn/step-by-step-kmeans-explained-in-detail

Please help.


回答1:


They have 3 features that they can use to cluster. Usually they will just take the euclidean distance of all the features to get the distance from cluster to cluster.

This is very easy to visualize in two dimensions. Take two points and the distance between them is the hypotenuse of a triangle. In three dimensions, it's a little harder to visualize. The author is simply using 2 dimensions so she can plot it later. However, to use all three dimensions you would simply modify the code to:

X = dataset.iloc[:,[1:3]].values

and that will use age,income and spending score in the algorithm



来源:https://stackoverflow.com/questions/59123264/how-to-select-which-columns-are-good-for-visualisation-in-k-means-clustering-alg

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!