Fastest way to iterate through large table using JDBC

前端 未结 3 1124
名媛妹妹
名媛妹妹 2020-12-07 23:07

I\'m trying to create a java program to cleanup and merge rows in my table. The table is large, about 500k rows and my current solution is running very slowly. The first thi

相关标签:
3条回答
  • 2020-12-07 23:40

    One thing that helped me was Statement.setFetchSize(Integer.MIN_VALUE). I got this idea from Jason's blog. This cut down execution time by more than half. Memory consumed went down dramatically (as only one row is read at a time.)

    This trick doesn't work for PreparedStatement, though.

    0 讨论(0)
  • 2020-12-07 23:46

    First of all, are you sure you need the whole table in memory? Maybe you should consider (if possible) selecting rows that you want to update/merge/etc. If you really have to have the whole table you could consider using a scrollable ResultSet. You can create it like this.

    // make sure autocommit is off (postgres)
    con.setAutoCommit(false);
    
    Statement stmt = con.createStatement(
                       ResultSet.TYPE_SCROLL_INSENSITIVE, //or ResultSet.TYPE_FORWARD_ONLY
                       ResultSet.CONCUR_READ_ONLY);
    ResultSet srs = stmt.executeQuery("select * from ...");
    

    It enables you to move to any row you want by using 'absolute' and 'relative' methods.

    0 讨论(0)
  • 2020-12-07 23:56

    Although it's probably not optimum, your solution seems like it ought to be fine for a one-off database cleanup routine. It shouldn't take that long to run a query like that and get the results (I'm assuming that since it's a one off a couple of seconds would be fine). Possible problems -

    • is your network (or at least your connection to mysql ) very slow? You could try running the process locally on the mysql box if so, or something better connected.

    • is there something in the table structure that's causing it? pulling down 10k of data for every row? 200 fields? calculating the id values to get based on a non-indexed row? You could try finding a more db-friendly way of pulling the data (e.g. just the columns you need, have the db aggregate values, etc.etc)

    If you're not getting through the second increment something is really wrong - efficient or not, you shouldn't have any problem dumping 2000, or 20,000 rows into memory on a running JVM. Maybe you're storing the data redundantly or extremely inefficiently?

    0 讨论(0)
提交回复
热议问题