ResultSet: Retrieving column values by index versus retrieving by label

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-27 23:39:50

问题


When using JDBC, I often come across constructs like

ResultSet rs = ps.executeQuery();
while (rs.next()) {
    int id = rs.getInt(1);
    // Some other actions
}

I asked myself (and authors of code too) why not to use labels for retrieving column values:

int id = rs.getInt("CUSTOMER_ID");

The best explanation I've heard is something concerning performance. But actually, does it make processing extremely fast? I don't believe so, though I have never performed measurements. Even if retrieving by label would be a bit slower, nevertheless, it provide better readability and flexibility, in my opinion.
So could someone give me good explanation of avoiding to retrieve column values by column index instead of column label? What are pros and cons of both approaches (maybe, concerning certain DBMS)?


回答1:


You should use string labels by default.

Pros:

  • Independence of column order
  • Better readability/maintainability

Cons:

  • You have no control over the column names (access via stored procedures)

Which would you prefer?

ints?

int i = 1;
customerId = resultSet.getInt(i++);
customerName = resultSet.getString(i++);
customerAddress = resultSet.getString(i++);

or Strings?

customerId = resultSet.getInt("customer_id");
customerName = resultSet.getString("customer_name");
customerAddress = resultSet.getString("customer_address");

And what if there is a new column inserted at position 1? Which code would you prefer? Or if the order of the columns is changed, which code version would you need to change at all?

That's why you should use string labels by default.




回答2:


Warning: I'm going to get bombastic here, because this drives me crazy.

99%* of the time, it's a ridiculous micro-optimization that people have some vague idea makes things 'better'. This completely ignores the fact that, unless you're in an extremely tight and busy loop over millions of SQL results all the time, which is hopefully rare, you'll never notice it. For everyone who's not doing that, the developer time cost of maintaing, updating, and fixing bugs in the column indexing are far greater than the incremental cost of hardware for your infinitesimally-worse-performing application.

Don't code optimizations like this in. Code for the person maintaining it. Then observe, measure, analyse, and optimize. Observe again, measure again, analyse again, and optimize again.

Optimization is pretty much the last step in development, not the first.

* Figure is made up.




回答3:


The answer has been accepted, none-the-less, here is some additional information and personal experience that I have not seen put forward yet.

Use column names (constants and not literals is preferred) in general and if possible. This is both clearer, is easier to maintain, and future changes are less likely to break the code.

There is, however, a use for column indexes. In some cases these are faster, but not sufficiently that this should override the above reasons for names*. These are very valuable when developing tools and general methods dealing with ResultSets. Finally, an index may be required because the column does not have a name (such as an unnamed aggregate) or there are duplicate names so there is no easy way to reference both.

*Note that I have written some JDBC drivers and looked inside some open sources one and internally these use column indexes to reference the result columns. In all cases I have worked with, the internal driver first maps a column name to an index. Thus, you can easily see that the column name, in all those cases, would always take longer. This may not be true for all drivers though.




回答4:


From the java documentation:

The ResultSet interface provides getter methods (getBoolean, getLong, and so on) for retrieving column values from the current row. Values can be retrieved using either the index number of the column or the name of the column. In general, using the column index will be more efficient. Columns are numbered from 1. For maximum portability, result set columns within each row should be read in left-to-right order, and each column should be read only once.

Of course each method (named or indexed) has its place. I agree that named columns should be the default. However, in cases where a huge number of loops are required, and where the SELECT statement is defined and maintained in the same section of code (or class), indexes should be ok - it is advisable to list the columns being selected, not just "SELECT * FROM...", since any table change will break the code.




回答5:


Sure, using column names increases readibility and makes maintenance easy. But using column names has a flipside. As you know, SQL allows multiple column names with same name, there's no guarantee that the column name you typed in the getter method of resultSet actually points to the column name you intend to access. In theory, using index numbers instead of column names is preffered, but it reduces the readability...

Thanks




回答6:


I don't think using the labels impacts performance by much. But there is another reason not to use Strings. Or ints, for that matter.

Consider using constants. Using an int constant makes the code more readably, but also less likely to have errors.

Besides being more readable, the constant also prevents you from making typo's in the label names - the compiler will throw an error if you do. And any IDE worth anything will pick it up. This is not the case if you use Strings or ints.




回答7:


I did some performance profiling on this exact subject on an Oracle database. In our code we have a ResultSet with numerous colums and a huge number of rows. Of the 20 seconds (!) the request takes to execute method oracle.jdbc.driver.ScrollableResultSet.findColumn(String name) takes about 4 seconds.

Obviously there's something wrong with the overall design, but using indexes instead of the column names would probably take this 4 seconds away.




回答8:


You can have the best of both! The speed of using indexes with the maintainability and security of using column names.

First - unless you are looping thru a result set just use column names.

  1. Define a set of integer variables, one for each column you will access. The names of the variables can include the name of the column: e.g. iLast_Name.

  2. Before the result set loop iterate thru the column metadata and set the value of each integer variable to the column index of the corresponding column name. If the index of the 'Last_Name' column is 3 then set the value of 'iLast_Name' to 3.

  3. In the result set loop use the integer variable names in the GET/SET methods. The variable name is a visual clue to the developer/maintainer as to the actual column name being accessed but the value is the column index and will give the best performance.

NOTE: the initial mapping (i.e. column name to index mapping) is only done once before the loop rather than for every record and column in the loop.




回答9:


The JDBC driver takes care for the column to index look-up. So if you extract values by column name each time the driver makes a look-up (usually in hash map) to check the corresponding index for the column name.




回答10:


I agree with previous answers that performance is not something that can force us to select either of the approaches. It would be good to consider the following things instead:

  • Code readability: for every developer reading your code labels have much more sense than indexes.
  • Maintenance: think of the SQL query and the way it is maintained. What is more likely to happen in your case after fixing/improving/refactoring SQL query: changing the order of the columns extracted or changing result column names. It seems for me that changing the order of the columns extracted (as the results of adding/deleting new columns in result set) has greater probability to happen.
  • Encapsulation: in spite of the way you choose try to isolate the code where you run SQL query and parse result set in the same component and make only this component aware about the column names and their mapping to the indexes (if you decided to use them).



回答11:


Using the index is an attempt at optimization.

The time saved by this is wasted by the extra effort it takes the developer to look up the necessary data to check if their code will work properly after the changes.

I think it's our built-in instinct to use numbers instead of text.




回答12:


Besides the look up in Map for labels it also leads to an extra String creation. Though it will happens on stack but still it caries a cost with it.

It all depends on the individual choice and till date I have used only indexes :-)




回答13:


As it is pointed out by other posters, I would stick to column names unless you have a really powerful reason not to do so. The impact in performance is negligible compared to, for example, query optimization. In this case, maintenance is much more important than an small optmization.



来源:https://stackoverflow.com/questions/186799/resultset-retrieving-column-values-by-index-versus-retrieving-by-label

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!