get cosine similarity between two documents in lucene

后端 未结 7 2144
野性不改
野性不改 2020-11-27 03:13

i have built an index in Lucene. I want without specifying a query, just to get a score (cosine similarity or another distance?) between two documents in the index.

7条回答
  •  臣服心动
    2020-11-27 04:02

    When indexing, there's an option to store term frequency vectors.

    During runtime, look up the term frequency vectors for both documents using IndexReader.getTermFreqVector(), and look up document frequency data for each term using IndexReader.docFreq(). That will give you all the components necessary to calculate the cosine similarity between the two docs.

    An easier way might be to submit doc A as a query (adding all words to the query as OR terms, boosting each by term frequency) and look for doc B in the result set.

提交回复
热议问题