Fastest way to iterate through large table using JDBC

前端未结

关注

 3  1124

I\'m trying to create a java program to cleanup and merge rows in my table. The table is large, about 500k rows and my current solution is running very slowly. The first thi

相关标签:

3条回答

陌清茗

2020-12-07 23:40

One thing that helped me was Statement.setFetchSize(Integer.MIN_VALUE). I got this idea from Jason's blog. This cut down execution time by more than half. Memory consumed went down dramatically (as only one row is read at a time.)

This trick doesn't work for PreparedStatement, though.

0 讨论(0)
发布评论:

提交评论
- 加载中...
爱一瞬间的悲伤

2020-12-07 23:46
First of all, are you sure you need the whole table in memory? Maybe you should consider (if possible) selecting rows that you want to update/merge/etc. If you really have to have the whole table you could consider using a scrollable ResultSet. You can create it like this.
```
// make sure autocommit is off (postgres)
con.setAutoCommit(false);

Statement stmt = con.createStatement(
                   ResultSet.TYPE_SCROLL_INSENSITIVE, //or ResultSet.TYPE_FORWARD_ONLY
                   ResultSet.CONCUR_READ_ONLY);
ResultSet srs = stmt.executeQuery("select * from ...");
```
It enables you to move to any row you want by using 'absolute' and 'relative' methods.
0 讨论(0)
发布评论:

提交评论
- 加载中...
忘掉有多难

2020-12-07 23:56
Although it's probably not optimum, your solution seems like it ought to be fine for a one-off database cleanup routine. It shouldn't take that long to run a query like that and get the results (I'm assuming that since it's a one off a couple of seconds would be fine). Possible problems -
- is your network (or at least your connection to mysql ) very slow? You could try running the process locally on the mysql box if so, or something better connected.
- is there something in the table structure that's causing it? pulling down 10k of data for every row? 200 fields? calculating the id values to get based on a non-indexed row? You could try finding a more db-friendly way of pulling the data (e.g. just the columns you need, have the db aggregate values, etc.etc)
If you're not getting through the second increment something is really wrong - efficient or not, you shouldn't have any problem dumping 2000, or 20,000 rows into memory on a running JVM. Maybe you're storing the data redundantly or extremely inefficiently?
0 讨论(0)
发布评论:

提交评论
- 加载中...