How to store multiple distinct types of documents in Lucene

自作多情 提交于 2019-12-05 13:57:49

I would definitely reject third option because of low selectivity of type index. There will be only 2 distinct values in type field each one with millions of documents. Lucene will need to merge this huge posting list with short posting list from idN index, which still can be very fast, but indeed wasteful.

First two ways are effectively the same on query phase, because you have different terms and posting lists for independent type of documents. Difference will be on the indexing phase. Managing several independent indexes require a bit more coordination and makes code a little bit more difficult. Yet it may be a good idea if you have plans on using indexes in different contexts. For example:

  • physical location;
  • backup strategies;
  • availability requirements;
  • time-to-index requirements (time from a document changed on client side until it visible in index)

Otherwise, I would go with a first option as more simple and manageable.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!