Python is slow when iterating over a large list

眉间皱痕 提交于 2019-12-01 03:16:32

This should not be slow with Python native lists - but maybe ODBC's driver is returning a "lazy" object that tries to be smart but just gets slow. Try just doing

allIDRows = list(clientItemsCursor.fetchall())

in your code and post further benchmarks.

(Python lists can get slow if you start inserting things in its middle, but just iterating over a large list should be fast)

It's probably slow because you load all result in memory first and performing the iteration over a list. Try iterating the cursor instead.

And no, scripts shouldn't be that slow.

clientItemsCursor.execute("Select ids from largetable where year =?", year);
for clientItemrow in clientItemsCursor:
    aID = str(clientItemrow[0])
    count = count + 1

More investigation is needed here... consider the following script:

bigList = range(500000)
doSomething = ""
arrayList = [[x] for x in bigList]  # takes a few seconds
for x in arrayList:
    doSomething += str(x[0])
    count+=1

This is pretty much the same as your script, minus the database stuff, and takes a few seconds to run on my not-terribly-fast machine.

When you connect to your database directly (I mean you get an SQL prompt), how many secods runs this query?

When query ends, you get a message like this:

NNNNN rows in set (0.01 sec)

So, if that time is so big, and your query is slow as "native", may be you have to create an index on that table.

This is slow because you are

  1. Getting all the results
  2. Allocating memory and assigning the values to that memory to create the list allIDRows
  3. Iterating over that list and counting.

If execute gives you back a cursor then use the cursor to it's advantage and start counting as you get stuff back and save time on the mem allocation.

clientItemsCursor.execute("Select ids from largetable where year =?", year);
for clientItemrow in clientItemsCursor:
   count +=1

Other hints:

  • create an index on year
  • use 'select count(*) from ... to get the count for the year' this will probably be optimised on the db.
  • Remove the aID line if not needed this is converting the first item of the row to a string even though its not used.
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!