问题
Last summer, I made a Java application that would parse some PDF files and get the information they contain to store them in a SQLite database.
Everything was fine and I kept adding new files to the database every week or so without any problems.
Now, I'm trying to improve my application's speed and I wanted to see how it would fare if I parsed all the files I have from the last two years in a new database. That's when I started getting this error: OutOfMemoryError: Java Heap Space. I didn't get it before because I was only parsing about 25 new files per week, but it seems like parsing 1000+ files one after the other is a lot more demanding.
I partially solved the problem: I made sure to close my connection after every call to the database and the error went away, but at a huge cost. Parsing the files is now unbearably slow. As for my ResultSets and Statements / PreparedStatements, I'm already closing them after every call.
I guess there's something I don't understand about when I should close my connection and when I should keep re-using the same one. I thought that since auto-commit is on, it commits after every transaction (select, update, insert, etc.) and the connection releases the extra memory it was using. I'm probably wrong since when I parse too many files, I end up getting the error I'm mentioning.
An easy solution would be to close it after every x calls, but then again I won't understand why and I'm probably going to get the same error later on. Can anyone explain when I should be closing my connections (if at all except when I'm done)? If I'm only supposed to do it when I'm done, then can someone explain how I'm supposed to avoid this error?
By the way, I didn't tag this as SQLite because I got the same error when I tried running my program on my online MySQL database.
Edit As it has been pointed out by Deco and Mavrav, maybe the problem isn't my Connection. Maybe it's the files, so I'm going to post the code I use to call the function to parse the files one by one:
public static void visitAllDirsAndFiles(File dir){
if (dir.isDirectory()){
String[] children = dir.list();
for (int i = 0; i < children.length; i++){
visitAllDirsAndFiles(new File(dir, children[i]));
}
}
else{
try{
// System.out.println("File: " + dir);
BowlingFilesReader.readFile(dir, playersDatabase);
}
catch (Exception exc){
System.out.println("Other exception in file: " + dir);
}
}
}
So if I call the method using a directory, it recursively calls the function again using the File object I just created. My method then detects that it's a file and calls BowlingFilesReader.readFile(dir, playersDatabase);
The memory should be released when the method is done I think?
回答1:
Your first instinct on open resultsets and connections was good, though maybe not entirely the cause. Let's start with your database connection first.
Database
Try using a database connection pooling library, such as the Apache Commons DBCP (BasicDataSource is a good place to start): http://commons.apache.org/dbcp/ You will still need to close your database objects, but this will keep things running smoothly on the database front.
JVM Memory
Increase the size of the memory you give to the JVM. You may do so by adding -Xmx and a memory amount after, such as:
- -Xmx64m <- this would give the JVM 64 megs of memory to play with
- -Xmx512m <- 512 megs
Be careful with your numbers, though, throwing more memory at the JVM will not fix memory leaks. You may use something like JConsole or JVisualVM (included in your JDK's bin/ folder) to observe how much memory you are using.
Threading
You may increase the speed of your operations by threading them out, assuming the operation you are performing to parse these records is threadable. But more information might be necessary to answer that question.
Hope this helps.
回答2:
As it happens with Garbage colleciton I dont think the memory would be immediately recollected for the subsequent processes and threads.So we cant entirely put our eggs in that basket.To begin with put all the files in a directory and not in child directories of the parent. Then load the file one by one by iterating like this
File f = null;
for (int i = 0; i < children.length; i++){
f = new File(dir, children[i]);
BowlingFilesReader.readFile(f, playersDatabase);
f = null;
}
So we are invalidating the reference so that the file object is released and will be picked up in the subsequent GC. And to check the limits test it by increasing the no. of files start with 100, 200 ..... and then we will know at what point OME is getting thrown. Hope this helps.
来源:https://stackoverflow.com/questions/9561828/database-connections-and-outofmemoryerror-java-heap-space