I have a SQL Table and I would like to select multiple rows by ID. For example I would like to get the row with IDs 1, 5 and 9 from my table.
I have been doing this
First, I think it is a stretch to claim that your data is suggestive of O(n log(n))
. (It is great that you did the performance test, by the way.) Here is the time per value:
1000 0.046
2000 0.047
3000 0.083
4000 0.079
5000 0.078
6000 0.078
7000 0.079
8000 0.081
9000 0.083
10000 0.085
Although there is a slight increase as time goes up, the jump from 2000-3000 is much, much more prominent. If this is reproducible, the question to me is why such a discontinuity.
To me, this is more suggestion of O(n)
and O(n log(n))
. BUT, empirical estimates of theoretical values are difficult to approximate. So, the exact limit is not so important.
I would expect performance to be O(n)
(where n
is the actual value and not the bit-length as it is in some estimates). My understanding is that the in
behaves like a giant set of or
s. Most records fail the test, so they have to do all the comparisons. Hence the O(n)
.
The next question is if you have an index on the id field. In that case, you can get the set of matching ids in O(n log(n)) time (
log (n)for traversing the index and
n` for doing it for each value). This seams worse, but we have left out the factor for the size of the original table. This should be a big win.
As Andre suggests, you can load a table and do a join to a temporary table. I would leave out the index, because you are probably better off using the index on the larger table. This should get you O(n log(n))
-- with no (significant) dependency on the size of the original table. Or, you can leave out the index and have O(n * m)
where m
is the size of the original table. I think any index build on the temporary table gets you back to O(n log(n))
performance (assuming the data is not presorted).
Placing everything in a query has a similar, unstated problem -- parsing the query. This takes longer as the string gets longer.
In short, I commend you for doing performance measurements, but not for coming to conclusions about algorithmic complexity. I don't think your data supports your conclusion. Also, the handling of queries is a bit more complicated than you suggest, and you have left out the size of the larger table -- which can have a dominant affect. And, I'm quite curious what is happening between 2000 and 3000 rows.