Why does Lucene cause OOM when indexing large files?

后端 未结 5 590
一向
一向 2021-01-13 05:01

I’m working with Lucene 2.4.0 and the JVM (JDK 1.6.0_07). I’m consistently receiving OutOfMemoryError: Java heap space, when trying to index large text files.<

5条回答
  •  渐次进展
    2021-01-13 05:57

    In response as a comment to Gandalf

    I can see you are setting the setMergeFactor to 1000

    the API says

    setMergeFactor

    public void setMergeFactor(int mergeFactor)

    Determines how often segment indices are merged by addDocument(). With smaller values, less RAM is used while indexing, and searches on unoptimized indices are faster, but indexing speed is slower. With larger values, more RAM is used during indexing, and while searches on unoptimized indices are slower, indexing is faster. Thus larger values (> 10) are best for batch index creation, and smaller values (< 10) for indices that are interactively maintained.

    This method is a convenience method, it uses the RAM as you increase the mergeFactor

    What i would suggest is set it to something like 15 or so on.; (on a trial and error basis) complemented with setRAMBufferSizeMB, also call Commit(). then optimise() and then close() the indexwriter object.(probably make a JavaBean and put all these methods in one method) call this method when you are closing the index.

    post with your result, feedback =]

提交回复
热议问题