Cosine Similarity

后端 未结 3 2004
我在风中等你
我在风中等你 2020-12-13 05:14

I calculated tf/idf values of two documents. The following are the tf/idf values:

1.txt
0.0
0.5
2.txt
0.0
0.5

The documents are like:

3条回答
  •  粉色の甜心
    2020-12-13 05:52

    simple java code implementation:

      static double cosine_similarity(Map v1, Map v2) {
                Set both = Sets.newHashSet(v1.keySet());
                both.retainAll(v2.keySet());
                double sclar = 0, norm1 = 0, norm2 = 0;
                for (String k : both) sclar += v1.get(k) * v2.get(k);
                for (String k : v1.keySet()) norm1 += v1.get(k) * v1.get(k);
                for (String k : v2.keySet()) norm2 += v2.get(k) * v2.get(k);
                return sclar / Math.sqrt(norm1 * norm2);
        }
    

提交回复
热议问题