问题
I am trying to understand the selection of columns in csv file which should be taken into considerations to apply k-means . In the below link only annual income and spending score is taken as a column (from Mall_Customers.csv file) for visualisation and not age. https://www.kaggle.com/shrutimechlearn/step-by-step-kmeans-explained-in-detail
Please help.
回答1:
They have 3 features that they can use to cluster. Usually they will just take the euclidean distance of all the features to get the distance from cluster to cluster.
This is very easy to visualize in two dimensions. Take two points and the distance between them is the hypotenuse of a triangle. In three dimensions, it's a little harder to visualize. The author is simply using 2 dimensions so she can plot it later. However, to use all three dimensions you would simply modify the code to:
X = dataset.iloc[:,[1:3]].values
and that will use age,income and spending score in the algorithm
来源:https://stackoverflow.com/questions/59123264/how-to-select-which-columns-are-good-for-visualisation-in-k-means-clustering-alg