Python in Windows: large number of inserts using pyodbc causes memory leak

I am trying to populate a MS SQL 2005 database using python on windows. I am inserting millions of rows, and by 7 million I am using almost a gigabyte of memory. The test below eats up 4 megs of RAM for each 100k rows inserted:

import pyodbc
connection=pyodbc.connect('DRIVER={SQL Server};SERVER=x;DATABASE=x;UID=x;PWD=x')
cursor=connection.cursor()
connection.autocommit=True
while 1:
    cursor.execute("insert into x (a,b,c,d, e,f) VALUES (?,?,?,?,?,?)",1,2,3,4,5,6)
mdbconn.close()

Hack solution: I ended up spawning a new process using the multiprocessing module to return memory. Still confused about why inserting rows in this way consumes so much memory. Any ideas?

I had the same issue, and it looks like a pyodbc issue with parameterized inserts: http://code.google.com/p/pyodbc/issues/detail?id=145

Temporarily switching to a static insert with the VALUES clause populated eliminates the leak, until I try a build from the current source.

Even I had faced the same problem.

I had to read more than 50 XML files each about 300 MB and load them into SQL Server 2005.

I tried the following :

Using the same cursor by dereferencing.

Closing /opening the connection

Setting the connection to None.

Finally ended up bootstrapping each XML file load using Process module.

Now I have replaced the process using IronPython - System.Data.SqlClient.

This give a better performance and also better interface.

Maybe close & re-open the connection every million rows or so?

Sure it doesn't solve anything, but if you only have to do this once you could get on with life!

Try creating a separate cursor for each insert. Reuse the cursor variable each time through the loop to implicitly dereference the previous cursor. Add a connection.commit after each insert.

You may only need something as simple as a time.sleep(0) at the bottom of each loop to allow the garbage collector to run.

You could also try forcing a garbage collection every once in a while with gc.collect() after importing the gc module.

Another option might be to use cursor.executemany() and see if that clears up the problem. The nasty thing about executemany(), though, is that it takes a sequence rather than an iterator (so you can't pass it a generator). I'd try the garbage collector first.

EDIT: I just tested the code you posted, and I am not seeing the same issue. Are you using an old version of pyodbc?

来源：https://stackoverflow.com/questions/4088931/python-in-windows-large-number-of-inserts-using-pyodbc-causes-memory-leak

标签

python

memory-leaks

pyodbc