Evaluating K-means accuracy

后端未结

关注

 2  1612

猫巷女王i 2020-12-10 00:10

I created a 3-dimensional random data sets with 4 defined patterns/classes in MATLAB. I applied the K-means algorithm on the data to see how well K-means can classify my sa

2条回答

既然无缘 (楼主)

2020-12-10 00:51
In addition to purity scores, consider using the following clustering metrics: Normalized Mutual Information (NMI), Variation of Information (VI) and Adjusted Rand Index (ARI). Given the predicted label assignments X and the ground truth labels Y, the NMI is defined as:
```
NMI(X;Y) = I(X;Y) / ((H(X)+H(Y))/2
```
where H(X) is the entropy and I(X;Y) is the mutual information. As the overlap between X and Y increases the NMI approaches 1. See Matlab implementation here. Variation of Information is defined as:
```
VI(X;Y) = H(X)+H(Y)-2I(X;Y) = H(X|Y) + H(Y|X)
```
Thus, VI decreases as the overlap between label assignments X and Y increases. See Matlab implementation here. Finally, the adjusted Rand index is defined as:
```
ARI = RI-E[RI] / (max RI - E[RI])
RI = TP + TN / (TP + FP + FN + TN)
```
Thus, ARI approaches 1 for cluster assignments that are similar to each other. See Python implementation here.

If you are interested in choosing the number of clusters K automatically based on data, consider using Dirichlet Process (DP) K-means. See paper and code for more information.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...