Computing F-measure for clustering

点点圈 提交于 2019-12-03 16:28:29

The term f-measure itself is underspecified. It's the harmonic mean, usually of precision and recall. Actually you should even say F1-score if you mean the unweighted version, because you can put different weight on the two input values. But without saying which two values are averaged (not in the sense of the arithmetic mean!) this doesn't say much.

https://en.wikipedia.org/wiki/F1_score

Note that the values must be in the 0-1 value range. Otherwise, you have an error earlier on.

In cluster analysis, the common approach is to apply the F1-Measure to the precision and recall of pairs, often referred to as "pair counting f-measure". But you could compute the same mean on other values, too.

Pair-counting has the nice property that it doesn't directly compare clusters, so the result is well defined when one result has m cluster, the other has n clusters. However, pair counting needs strict partitions. When elements are not clustered or assigned to more than one cluster, the pair-counting measures can easily go out of the range 0-1.

Discusses some of these metrics (including Rand index and such) and gives a simple explanation of the "pair counting F-measure".

The N in your formula, F(C,K) = ∑ | ci | / N * max {F(ci,kj)}, is the sum of the |ci| over all i i.e. it is the total number of elements. You are perhaps mistaking it to be the number of clusters and therefore are getting an answer greater than one. If you make the change, your answer will be between 1 and 0.

The Example provided by mahesh cs is correct and should help you (and others) to understand how pair counting f-measure works.

The provided example comes from the paper "Characterization and evaluation of similarity measures for pairs of clusterings" by Darius Pfitzner, Richard Leibbrandt & David Powers, and contains a lot of useful information regarding this subject.

mahesh cs

So for example given the set,

           D = {1, 2, 3, 4, 5, 6}

and the partitions,

           P = {1, 2, 3}, {4, 5}, {6}, and
           Q = {1, 2, 4}, {3, 5, 6}

where P is set created by our algorithm and Q is set created by standard algorithm we known

           PairsP = {(1, 2), (1, 3), (2, 3), (4, 5)},
           PairsQ = {(1, 2), (1, 4), (2, 4), (3, 5), (3, 6), (5, 6)}, and
           PairsD = {(1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 3), (2, 4),
                      (2, 5), (2, 6), (3, 4), (3, 5), (3, 6), (4, 5), (4, 6), (5, 6)}

so,

           a = | PairsP intersection PairsQ | = |(1, 2)| = 1
           b = | PairsP- PairsQ | = |(1, 3)(2, 3)(4, 5)| = 3
           c = | PairsQ- PairsP  | = |(1, 4)(2, 4)(3, 5)(3, 6)(5, 6)| = 5
         
     F-measure= 2a/(2a+b+c)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!