I’m working with Lucene 2.4.0 and the JVM (JDK 1.6.0_07). I’m consistently receiving OutOfMemoryError: Java heap space, when trying to index large text files.<
In response as a comment to Gandalf
I can see you are setting the setMergeFactor to 1000
the API says
setMergeFactor
public void setMergeFactor(int mergeFactor)
Determines how often segment indices are merged by addDocument(). With smaller values, less RAM is used while indexing, and searches on unoptimized indices are faster, but indexing speed is slower. With larger values, more RAM is used during indexing, and while searches on unoptimized indices are slower, indexing is faster. Thus larger values (> 10) are best for batch index creation, and smaller values (< 10) for indices that are interactively maintained.
This method is a convenience method, it uses the RAM as you increase the mergeFactor
What i would suggest is set it to something like 15 or so on.; (on a trial and error basis) complemented with setRAMBufferSizeMB, also call Commit(). then optimise() and then close() the indexwriter object.(probably make a JavaBean and put all these methods in one method) call this method when you are closing the index.
post with your result, feedback =]