Using Python to upload large csv files to Postgres RDS in AWS

断了今生、忘了曾经 提交于 2019-12-11 10:32:53

问题


What's the easiest way to load a large csv file into a Postgres RDS database in AWS using Python?

To transfer data to a local postgres instance, I have previously used a psycopg2 connection to run SQL statements like:

COPY my_table FROM 'my_10gb_file.csv' DELIMITER ',' CSV HEADER;

However, when executing this against a remote AWS RDS database, this generates an error because the .csv file is on my local machine rather than the database server:

ERROR: must be superuser to COPY to or from a file
SQL state: 42501
Hint: Anyone can COPY to stdout or from stdin. psql's \copy command also works for anyone.

This answer explains why this doesn't work.

I'm now looking for the Python syntax to automate this using psql. I have a large number of .csv files I need to upload, so I need a script to automate this.


回答1:


First you need to create the table definitions in the RDS Postgres as normal using CREATE TABLE SQL statements.

Then you need to run a psql statement like this:

psql -p 5432 --host YOUR_HOST --username YOUR_USERNAME --dbname YOUR_DBNAME --command "\copy my_table FROM 'my_10gb_file.csv' DELIMITER ',' CSV HEADER"

In Python, we can set this up and execute it as follows:

host = "YOUR_HOST"
username = "YOUR_USERNAME"
dbname = "YOUR_DBNAME"

table_name = "my_table"
file_name = "my_10gb_file.csv"
command = "\copy {} FROM '{}' DELIMITER ',' CSV HEADER".format(table_name, file_name)

psql_template = 'psql -p 5432 --host {} --username {} --dbname {} --command "{}"'

bash_command = psql_template.format(host, username, dbname, command.strip())

process = subprocess.Popen(bash_command, stdout=subprocess.PIPE, shell=True) 

output, error = process.communicate()


来源:https://stackoverflow.com/questions/46969474/using-python-to-upload-large-csv-files-to-postgres-rds-in-aws

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!