Why don't we include 0 matches while calculating jaccard distance between binary numbers?

馋奶兔 提交于 2019-12-24 09:26:02

问题


I am working on a program based on Jaccard Distance, and I need to calculate the Jaccard Distance between two binary bit vectors. I came across the following on the net:

 If p1 = 10111 and p2 = 10011,

 The total number of each combination attributes for p1 and p2:

 M11 = total number of attributes where p1 & p2 have a value 1,
 M01 = total number of attributes where p1 has a value 0 & p2 has a value 1,
 M10 = total number of attributes where p1 has a value 1 & p2 has a value 0,
 M00 = total number of attributes where p1 & p2 have a value 0.
 Jaccard similarity coefficient = J = 
 intersection/union = M11/(M01 + M10 + M11) 
 = 3 / (0 + 1 + 3) = 3/4,

 Jaccard distance = J' = 1 - J = 1 - 3/4 = 1/4, 
 Or J' = 1 - (M11/(M01 + M10 + M11)) = (M01 + M10)/(M01 + M10 + M11)
 = (0 + 1)/(0 + 1 + 3) = 1/4

Now, while calculating the coefficient, why was "M00" not included in the denominator? Can anyone please explain?


回答1:


Jaccard coefficient is a measure of asymmetric binary attributes,f.e., a scenario where the presence of an item is more important than its absence.

Since M00 deals only with absence, we do not consider it while calculating Jaccard coeffecient.

For example, while checking for the presence/absence of a disease, the presence of the disease is the more significant outcome.

Hope it helps!




回答2:


The Jacquard index of A and B is |A∩B|/|A∪B| = |A∩B|/(|A| + |B| - |A∩B|).

We have: |A∩B| = M11, |A| = M11 + M10, |B| = M11 + M01.

So |A∩B|/(|A| + |B| - |A∩B|) = M11 / (M11 + M10 + M11 + M01 - M11) = M11 / (M10 + M01 + M11).

This Venn diagram may help:



来源:https://stackoverflow.com/questions/43518507/why-dont-we-include-0-matches-while-calculating-jaccard-distance-between-binary

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!