sklearn: calculating accuracy score of k-means on the test data set

后端 未结 1 930
逝去的感伤
逝去的感伤 2020-12-19 08:01

I am doing k-means clustering on the set of 30 samples with 2 clusters (I already know there are two classes). I divide my data into training and test set and try to calcula

相关标签:
1条回答
  • In terms of evaluating accuracy. You should remember that k-means is not a classification tool, thus analyzing accuracy is not a very good idea. You can do this, but this is not what k-means is for. It is supposed to find a grouping of data which maximizes between-clusters distances, it does not use your labeling to train. Consequently, things like k-means are usually tested with things like RandIndex and other clustering metrics. For maximization of accuracy you should fit actual classifier, like kNN, logistic regression, SVM, etc.

    In terms of the code itself, k_means.predict(X_test) returns labeling, it does not update the internal labels_ field, you should do

    print(k_means.predict(X_test))
    

    Furthermore in python you do not have to (and should not) use [:] to print an array, just do

    print(k_means.labels_)
    print(y_test)
    
    0 讨论(0)
提交回复
热议问题