SELECT only a certain set of rows at a time

不羁岁月 提交于 2019-12-24 04:17:15

问题


I need to select data from one table and insert it into another table. Currently the SQL looks something like this:

   INSERT INTO A (x, y, z)
   SELECT x, y, z
   FROM B b
   WHERE ...

However, the SELECT is huge, resulting in over 2 millions rows and we think it is taking up too much memory. Informix, the db in this case, runs out of virtual memory when the query is run.

How would I go about selecting and inserting a set of rows (say 2000)? Given that I don't think there are any row ids etc.


回答1:


You can do SELECT FIRST n * from Table. Where n is the amount of rows you want, say 2000. Also, in the WHERE clause do an embedded select that checks the table you are inserting in to for rows already existing. So that the next time the statement is ran, it will not include already inserted data.




回答2:


I assume that you have some script that this is executed from? You can just loop and limit as long as you order the values returned from the nested select. Here is some pseudo code.

total = SELECT COUNT(x) FROM B WHERE ...
while (total > 0) 
  INSERT INTO A (x, y, z) SELECT x, y, z FROM B b WHERE ... ORDER BY x LIMIT 2000
  total = total - 2000
end



回答3:


I'm almost certain that IDS only lets you use the FIRST clause where the data is returned to the client1, and that is something you want to avoid if at all possible.

You say you get an out of memory error (rather than, say, a long transaction aborted error)? Have you looked at the configuration of your server to ensure it has a reasonable amount of memory?

It depends in part on how big your data set is, and what the constraints are - why you are doing the load across tables. But I would normally aim to determine a way of partitioning the data into loadable subsets and run those sequentially in a loop. For example, if the sequence numbers are between 1 and 10,000,000, I might run the loop ten times, with condition on the sequence number for AND seqnum >= 0 AND seqnum < 1000000' and thenAND seqnum >= 1000000 AND seqnum < 2000000', etc. Preferably in a language with the ability to substitute the range via variables.

This is a bit nuisancy, and you want to err on the conservative side in terms of range size (more smaller partitions rather than fewer bigger ones - to reduce the risk of running out of memory).


1 Over-simplifying slightly. A stored procedure would have to count as 'the client', for example, and the communication cost in a stored procedure is (a lot) less than the cost of going to the genuine client.



来源:https://stackoverflow.com/questions/2429126/select-only-a-certain-set-of-rows-at-a-time

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!