Python in Windows: large number of inserts using pyodbc causes memory leak

末鹿安然 提交于 2019-12-07 13:23:02

问题


I am trying to populate a MS SQL 2005 database using python on windows. I am inserting millions of rows, and by 7 million I am using almost a gigabyte of memory. The test below eats up 4 megs of RAM for each 100k rows inserted:

import pyodbc
connection=pyodbc.connect('DRIVER={SQL Server};SERVER=x;DATABASE=x;UID=x;PWD=x')
cursor=connection.cursor()
connection.autocommit=True
while 1:
    cursor.execute("insert into x (a,b,c,d, e,f) VALUES (?,?,?,?,?,?)",1,2,3,4,5,6)
mdbconn.close()

Hack solution: I ended up spawning a new process using the multiprocessing module to return memory. Still confused about why inserting rows in this way consumes so much memory. Any ideas?


回答1:


I had the same issue, and it looks like a pyodbc issue with parameterized inserts: http://code.google.com/p/pyodbc/issues/detail?id=145

Temporarily switching to a static insert with the VALUES clause populated eliminates the leak, until I try a build from the current source.




回答2:


Even I had faced the same problem.

I had to read more than 50 XML files each about 300 MB and load them into SQL Server 2005.

I tried the following :

Using the same cursor by dereferencing.

Closing /opening the connection

Setting the connection to None.

Finally ended up bootstrapping each XML file load using Process module.

Now I have replaced the process using IronPython - System.Data.SqlClient.

This give a better performance and also better interface.




回答3:


Maybe close & re-open the connection every million rows or so?

Sure it doesn't solve anything, but if you only have to do this once you could get on with life!




回答4:


Try creating a separate cursor for each insert. Reuse the cursor variable each time through the loop to implicitly dereference the previous cursor. Add a connection.commit after each insert.

You may only need something as simple as a time.sleep(0) at the bottom of each loop to allow the garbage collector to run.




回答5:


You could also try forcing a garbage collection every once in a while with gc.collect() after importing the gc module.

Another option might be to use cursor.executemany() and see if that clears up the problem. The nasty thing about executemany(), though, is that it takes a sequence rather than an iterator (so you can't pass it a generator). I'd try the garbage collector first.

EDIT: I just tested the code you posted, and I am not seeing the same issue. Are you using an old version of pyodbc?



来源:https://stackoverflow.com/questions/4088931/python-in-windows-large-number-of-inserts-using-pyodbc-causes-memory-leak

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!