How do I remove logically deleted documents from a Solr index?

坚强是说给别人听的谎言 提交于 2019-12-07 14:39:12

问题


I am implementing Solr for a free text search for a project where the records available to be searched will need to be added and deleted on a large scale every day.

Because of the scale I need to make sure that the size of the index is appropriate.

On my test installation of Solr, I index a set of 10 documents. Then I make a change in one of the document and want to replace the document with the same ID in the index. This works correctly and behaves as expected when I search.

I am using this code to update the document:

getSolrServer().deleteById(document.getIndexId());
getSolrServer().add(document.getSolrInputDocument());
getSolrServer().commit();

What I noticed though is that when I look at the stats page for the Solr server that the figures are not what I expect.

After the initial index, numDocs and maxDocs both equal 10 as expected. When I update the document however, numDocs is still equal to 10 (expected) but maxDocs equals 11 (unexpected).

When reading the documentation I see that

maxDoc may be larger as the maxDoc count includes logically deleted documents that have not yet been removed from the index.

So the question is, how do I remove logically deleted documents from the index?

If these documents still exist in the index do I run the risk of performance penalties when this is run with a very large volume of documents?

Thanks :)


回答1:


You have to optimize your index.

Note that an optimize is expansive, you probably should not do it more than daily.

Here is some more info on optimize:

http://www.lucidimagination.com/search/document/CDRG_ch06_6.3.1.3

http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations



来源:https://stackoverflow.com/questions/3053425/how-do-i-remove-logically-deleted-documents-from-a-solr-index

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!