Import Data to SQL using Python

六眼飞鱼酱① 提交于 2019-12-11 17:44:49

问题


I'm going to need to import 30k rows of data from a CSV file into a Vertica database. The code I've tried with is taking more than an hour to do so. I'm wondering if there's a faster way to do it? I've tried to import using csv and also by looping through a dataframe to insert, but it just isn't fast enough. Infact, it's way too slow. Could you please help me?

rownum=df.shape[0]
for x in range(0,rownum):
 a=df['AccountName'].values[x]
 b=df['ID'].values[x]
 ss="INSERT INTO Table (AccountName,ID) VALUES (%s,%s)"
 val=(a,b)
 cur.execute(ss,val)

connection.commit()

回答1:


You want to use the COPY command (COPY).

COPY Table FROM '/path/to/csv/file.csv' DELIMITER ',';

This is much faster than inserting each row at a time.

Since you are using python, I would recommend the vertica_python module as it has a very convenient copy method on it's cursor object (vertica-python GitHub page).

The syntax for using COPY with vertica-python is as follows:

with open('file.csv', 'r') as file:
    csv_file = file.read()
    copy_cmd = "COPY Table FROM STDIN DELIMITER ','"
    cur.copy(copy_cmd, csv_file)
    connection.commit()

Another thing you can do to speed up the process is compress the csv file. Vertica can read gzip, bzip and lzo compressed files.

with open('file.csv.gz', 'r') as file:
    gzipped_csv_file = file.read()
    copy_cmd = "COPY Table FROM STDIN GZIP DELIMITER ','"
    cur.copy(copy_cmd, gzipped_csv_file)
    connection.commit()

Copying compressed files will reduce network time. So you have to determine if the extra time it takes to compress the csv file is made up for in the time saved copying the compressed files. In most cases I've dealt with, it is worth it to compress the file.



来源:https://stackoverflow.com/questions/54406744/import-data-to-sql-using-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!