问题
I am working on a program based on Jaccard Distance, and I need to calculate the Jaccard Distance between two binary bit vectors. I came across the following on the net:
If p1 = 10111 and p2 = 10011,
The total number of each combination attributes for p1 and p2:
M11 = total number of attributes where p1 & p2 have a value 1,
M01 = total number of attributes where p1 has a value 0 & p2 has a value 1,
M10 = total number of attributes where p1 has a value 1 & p2 has a value 0,
M00 = total number of attributes where p1 & p2 have a value 0.
Jaccard similarity coefficient = J =
intersection/union = M11/(M01 + M10 + M11)
= 3 / (0 + 1 + 3) = 3/4,
Jaccard distance = J' = 1 - J = 1 - 3/4 = 1/4,
Or J' = 1 - (M11/(M01 + M10 + M11)) = (M01 + M10)/(M01 + M10 + M11)
= (0 + 1)/(0 + 1 + 3) = 1/4
Now, while calculating the coefficient, why was "M00" not included in the denominator? Can anyone please explain?
回答1:
Jaccard coefficient is a measure of asymmetric binary attributes,f.e., a scenario where the presence of an item is more important than its absence.
Since M00 deals only with absence, we do not consider it while calculating Jaccard coeffecient.
For example, while checking for the presence/absence of a disease, the presence of the disease is the more significant outcome.
Hope it helps!
回答2:
The Jacquard index of A and B is |A∩B|/|A∪B| = |A∩B|/(|A| + |B| - |A∩B|).
We have: |A∩B| = M11, |A| = M11 + M10, |B| = M11 + M01.
So |A∩B|/(|A| + |B| - |A∩B|) = M11 / (M11 + M10 + M11 + M01 - M11) = M11 / (M10 + M01 + M11).
This Venn diagram may help:
来源:https://stackoverflow.com/questions/43518507/why-dont-we-include-0-matches-while-calculating-jaccard-distance-between-binary