loading huge XLS data into Oracle using python

别说谁变了你拦得住时间么 提交于 2019-12-11 09:09:29

问题


I have a 3+ million record XLS file which i need to dump in Oracle 12C DB (direct dump) using a python 2.7.

I am using Cx_Oracle python package to establish connectivity to Oracle , but reading and dumping the XLS (using openpyxl pckg) is extremely slow and performance degrades for thousands/million records.

From a scripting stand point used two ways-

  1. I've tried bulk load , by reading all the values in array and then dumping it using cursor prepare (with bind variables) and cursor fetchmany.This doesn't work well with huge data.

  2. Iterative loading of the data as it is being fetched.Even this way has performance issues.

What options and techniques/packages can i deploy as a best practise to load this volume of data from XLS to Oracle DB ?Is it advisable to load this volume of data via scripting or should i necessarily use an ETL tool ? As of now i only have option via python scripting so please do answer the former


回答1:


If is possible to export your excel fila as a CSV, then all you need is to use sqlldr to load the file in db




回答2:


Excel also comes with ODBC support so you could pump straight from Excel to Oracle assuming you have the drivers. That said, anything that involves transforming a large amount of data in memory (from whatever Excel is using internally) and then passing it to the DB is likely to be less performant than a specialised bulk operation which can be optimised to use less memory. Going through Python just adds another layer to the task (Excel to Python to Oracle), though it might be possible to set this up to use streams.




回答3:


Basically for high volume data any language going to be stressed on I/O, except C. Best way is to use native tools/utilities provided by the DB vendor. For oracle the correct fit is SQL Loader.

Refer this link for quick tutorial http://www.thegeekstuff.com/2012/06/oracle-sqlldr/

Here you go… Sample code that runs SQL Loader and gets you back with return code, output & error

sql_ld_command = ['sqlldr ',  'uid/passwd', 'CONTROL=', 
'your_ctrl_file_path', 'DATA=', 'your_data_file_path']   

sql_ldr_proc  = subprocess.Popen(sql_ld_command, stdin=subprocess.PIPE,stdout=subprocess.PIPE, stderr=subprocess.PIPE)   

out, err  = sql_ldr_proc.communicate()  
retn_code = sql_ldr_proc.wait()


来源:https://stackoverflow.com/questions/30324370/loading-huge-xls-data-into-oracle-using-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!