Python: How to use a generator to avoid sql memory issue

前提是你 提交于 2019-12-03 03:50:11

With MySQLdb, the default cursor loads the entire result set into a Python list when the call to cursor.execute(..) is made. For a large query that may cause a MemoryError whether or not you use a generator.

Instead, use an SSCursor or SSDictCursor. These will keep the result set on the server side, and allow you to interate through the items in the result set on the client side:

import MySQLdb  
import MySQLdb.cursors as cursors

def getUNames(self):
    # You may of course want to define `self.gdbCursor` somewhere else...
    conn = MySQLdb.connect(..., cursorclass=cursors.SSCursor)
    #                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    #                       Set the cursor class to SSCursor here
    self.gdbCursor = conn.cursor()

    globalUserQuery = ur'''SELECT gu_name FROM globaluser WHERE gu_locked = 0'''
    try:
        self.gdbCursor.execute(globalUserQuery)
        for row in self.gdbCursor:
            uName = unicode(row['gu_name'], 'utf-8')
            yield uName
    except Exception, e:
        traceback.print_exc()

There isn't much documentation on the difference between the default Cursor and the SSCursor. The best source I know is the docstrings of the Cursor Mixin classes themselves:

The default cursor uses a CursorStoreResultMixIn:

In [2]: import MySQLdb.cursors as cursors
In [8]: print(cursors.CursorStoreResultMixIn.__doc__)
This is a MixIn class which causes the entire result set to be
    stored on the client side, i.e. it uses mysql_store_result(). If the
    result set can be very large, consider adding a LIMIT clause to your
    query, or using CursorUseResultMixIn instead.

and the SSCursor uses a CursorUseResultMixIn:

In [9]: print(cursors.CursorUseResultMixIn.__doc__)
This is a MixIn class which causes the result set to be stored
    in the server and sent row-by-row to client side, i.e. it uses
    mysql_use_result(). You MUST retrieve the entire result set and
    close() the cursor before additional queries can be peformed on
    the connection.

Since I changed getUNames into a generator, it would be used like this:

for row in self.getUnames():
    ...
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!