Should I keep Lucene IndexWriter open for entire indexing or close after each document addition?

℡╲_俬逩灬. 提交于 2019-12-12 05:05:32

问题


Is closing Lucene IndexWriter after each document addition slow down my indexing process?

I imagine, closing and opening index writer will slow down my indexing process or is it not true for Lucene?

Basically, I have a Lucene Indexer Step in a Spring Batch Job and I am creating indices in ItemProcessor. Indexer Step is a partitioned step and I create IndexWriter when ItemProcessor is created and keep it open till step completion.

@Bean
    @StepScope
    public ItemProcessor<InputVO,OutputVO> luceneIndexProcessor(@Value("#{stepExecutionContext[field1]}") String str) throws Exception{
        boolean exists = IndexUtils.checkIndexDir(str);
        String indexDir = IndexUtils.createAndGetIndexPath(str, exists);
        IndexWriterUtils indexWriterUtils = new IndexWriterUtils(indexDir, exists);
        IndexWriter indexWriter = indexWriterUtils.createIndexWriter();
        return new LuceneIndexProcessor(indexWriter);
    }

Is there a way to close this IndexWriter after step completion?

Also, I was encountering issues because I do search also in this step to find duplicate documents but I fixed that by adding writer.commit(); before opening reader and searching.

Please suggest if I need to close and open after each document addition or can keep it open all along? and also how to close in StepExecutionListenerSupport's afterStep?

Initially, I was closing and reopening for each document but indexing process was very slow so I thought it might be the reason.


回答1:


Since in development, index directory is of small size so we may not see much gain but for large index directory sizes, we need not to do unnecessary creation and closing for IndexWriter as well as IndexReader.

In Spring Batch, I accomplished it with these steps

1.As pointed in my other question, first we need to address problem of serialization to put object in ExecutionContext.

2.We create and put instance of composite serializable object in ExecutionContext in partitioner.

3.Pass value from ExecutionContext to your step reader, processor or writer in configuration,

    @Bean
    @StepScope
    public ItemProcessor<InputVO,OutputVO> luceneIndexProcessor(@Value("#{stepExecutionContext[field1]}") String field1,@Value("#{stepExecutionContext[luceneObjects]}") SerializableLuceneObjects luceneObjects) throws Exception{
        LuceneIndexProcessor indexProcessor =new LuceneIndexProcessor(luceneObjects);
        return indexProcessor;
    }

4.Use this instance passed in processor wherever you need and use getter method to get index reader or writer,public IndexWriter getLuceneIndexWriter() {return luceneIndexWriter;}

5.Finally in StepExecutionListenerSupport 's afterStep(StepExecution stepExecution) close this writer or reader by getting it from ExecutionContext.

ExecutionContext executionContext = stepExecution.getExecutionContext();
SerializableLuceneObjects slObjects = (SerializableLuceneObjects)executionContext.get("luceneObjects");
IndexWriter luceneIndexWriter = slObjects.getLuceneIndexWriter();
IndexReader luceneIndexReader = slObjects.getLuceneIndexReader();
if(luceneIndexWriter !=null ) luceneIndexWriter.close();
if(luceneIndexReader != null) luceneIndexReader.close();


来源:https://stackoverflow.com/questions/39701195/should-i-keep-lucene-indexwriter-open-for-entire-indexing-or-close-after-each-do

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!