Lucene: How to perform search on several independent index sets and merge the result?

独自空忆成欢 提交于 2019-12-01 11:30:58

问题


Now I have several Lucene index sets (I call it shards), which indexes different document sets. They are independent, which means I can perform search on each of them without reading others. Then I get a query request. I want to search it over every index set and combine the result to form the final top documents.

I know that when scoring the documents, Lucene needs to know the <idf> of every term, and different index sets will give different <idf> to the same term (because different index sets hold different document sets). Thus to my understanding, I cannot compare the document score from different index sets directly. Then how should I generate the final result?

An obvious solution would be first merge the index and then perform the search over the big index. However, this is tooo time-consuming for me and thus unacceptable. Anyone has other better solutions?

P.S.: I don't want to use any packages or softwares (like Katta) except Lucene and Hadoop.


回答1:


I think MultiReader is what you are looking for. If you have multiple IndexReaders, say reader1 and reader2:

MultiReader multiReader = new MultiReader(reader1, reader2);
IndexSearcher searcher = new IndexSearcher(multiReader);


来源:https://stackoverflow.com/questions/16789618/lucene-how-to-perform-search-on-several-independent-index-sets-and-merge-the-re

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!