Fetching huge data from Oracle in Python

问题

I need to fetch huge data from Oracle (using cx_oracle) in python 2.6, and to produce some csv file.

The data size is about 400k record x 200 columns x 100 chars each.

Which is the best way to do that?

Now, using the following code...

ctemp = connection.cursor()
ctemp.execute(sql)
ctemp.arraysize = 256
for row in ctemp:
  file.write(row[1])
  ...

... the script remain hours in the loop and nothing is writed to the file... (is there a way to print a message for every record extracted?)

Note: I don't have any issue with Oracle, and running the query in SqlDeveloper is super fast.

Thank you, gian

回答1:

You should use cur.fetchmany() instead. It will fetch chunk of rows defined by arraysise (256)

Python code:

def chunks(cur): # 256
    global log, d
    while True:
        #log.info('Chunk size %s' %  cur.arraysize, extra=d)
        rows=cur.fetchmany()

        if not rows: break;
        yield rows

Then do your processing in a for loop;

for i, chunk  in enumerate(chunks(cur)):
            for row in chunk:
                     #Process you rows here

That is exactly how I do it in my TableHunter for Oracle.

回答2:

add print statements after each line
add a counter to your loop indicating progress after each N rows
look into a module like 'progressbar' for displaying a progress indicator

回答3:

I think your code is asking the database for the data one row at the time which might explain the slowness.

Try:

ctemp = connection.cursor()
ctemp.execute(sql)
Results = ctemp.fetchall()
for row in Results:
    file.write(row[1])

来源：https://stackoverflow.com/questions/19243571/fetching-huge-data-from-oracle-in-python

标签

python

Oracle

file

cx-oracle