Import Data to SQL using Python

问题

I'm going to need to import 30k rows of data from a CSV file into a Vertica database. The code I've tried with is taking more than an hour to do so. I'm wondering if there's a faster way to do it? I've tried to import using csv and also by looping through a dataframe to insert, but it just isn't fast enough. Infact, it's way too slow. Could you please help me?

rownum=df.shape[0]
for x in range(0,rownum):
 a=df['AccountName'].values[x]
 b=df['ID'].values[x]
 ss="INSERT INTO Table (AccountName,ID) VALUES (%s,%s)"
 val=(a,b)
 cur.execute(ss,val)

connection.commit()

回答1:

You want to use the COPY command (COPY).

COPY Table FROM '/path/to/csv/file.csv' DELIMITER ',';

This is much faster than inserting each row at a time.

Since you are using python, I would recommend the vertica_python module as it has a very convenient copy method on it's cursor object (vertica-python GitHub page).

The syntax for using COPY with vertica-python is as follows:

with open('file.csv', 'r') as file:
    csv_file = file.read()
    copy_cmd = "COPY Table FROM STDIN DELIMITER ','"
    cur.copy(copy_cmd, csv_file)
    connection.commit()

Another thing you can do to speed up the process is compress the csv file. Vertica can read gzip, bzip and lzo compressed files.

with open('file.csv.gz', 'r') as file:
    gzipped_csv_file = file.read()
    copy_cmd = "COPY Table FROM STDIN GZIP DELIMITER ','"
    cur.copy(copy_cmd, gzipped_csv_file)
    connection.commit()

Copying compressed files will reduce network time. So you have to determine if the extra time it takes to compress the csv file is made up for in the time saved copying the compressed files. In most cases I've dealt with, it is worth it to compress the file.

来源：https://stackoverflow.com/questions/54406744/import-data-to-sql-using-python

标签

python

sql

vertica