How to find similar documents

你说的曾经没有我的故事 提交于 2020-01-01 00:45:13

问题


How do you find a similar documents of a given document in Lucene. I do not know what the text is i only know what the document is. Is there a way to find similar documents in lucene. I am a newbie so I may need some hand holding.


回答1:


you may want to check the MoreLikeThis feature of lucene.

MoreLikeThis constructs a lucene query based on terms within a document to find other similar documents in the index.

http://lucene.apache.org/java/3_0_1/api/contrib-queries/org/apache/lucene/search/similar/MoreLikeThis.html

Sample code example (java reference) -

MoreLikeThis mlt = new MoreLikeThis(reader); // Pass the index reader
mlt.setFieldNames(new String[] {"title", "author"}); // specify the fields for similiarity

Query query = mlt.like(docID); // Pass the doc id 
TopDocs similarDocs = searcher.search(query, 10); // Use the searcher
if (similarDocs.totalHits == 0)
    // Do handling
}


来源:https://stackoverflow.com/questions/7657673/how-to-find-similar-documents

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!