Why does Lucene cause OOM when indexing large files?

后端未结

关注

 5  602

一向 2021-01-13 05:01

I’m working with Lucene 2.4.0 and the JVM (JDK 1.6.0_07). I’m consistently receiving OutOfMemoryError: Java heap space, when trying to index large text files.<

5条回答

渐次进展 (楼主)

2021-01-13 05:57

In response as a comment to Gandalf

I can see you are setting the setMergeFactor to 1000

the API says

setMergeFactor

public void setMergeFactor(int mergeFactor)

Determines how often segment indices are merged by addDocument(). With smaller values, less RAM is used while indexing, and searches on unoptimized indices are faster, but indexing speed is slower. With larger values, more RAM is used during indexing, and while searches on unoptimized indices are slower, indexing is faster. Thus larger values (> 10) are best for batch index creation, and smaller values (< 10) for indices that are interactively maintained.

This method is a convenience method, it uses the RAM as you increase the mergeFactor

What i would suggest is set it to something like 15 or so on.; (on a trial and error basis) complemented with setRAMBufferSizeMB, also call Commit(). then optimise() and then close() the indexwriter object.(probably make a JavaBean and put all these methods in one method) call this method when you are closing the index.

post with your result, feedback =]

0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...