问题
I want to update a table in AWS on a daily basis, what I plan to do is to delete data/rows in a public table in AWS using Python psycopg2 first, then insert a python dataframe data into that table.
import psycopg2
import pandas as pd
con=psycopg2.connect(dbname= My_Credential.....)
cur = con.cursor()
sql = """
DELETE FROM tableA
"""
cur.execute(sql)
con.commit()
the above code can do the delete, but I don't know how to write python code to insert My_Dataframe to the tableA. TableA size is around 1 millions rows to 5 millions, please advise.
回答1:
I agree with what @mdem7 has suggested in comment, inserting 1-5 million data using dataframe
is not a good idea at all and you will face performance issues.
Its better to use the S3
to Redshift
load approach. Here goes your code to do both Truncate
and Copy
command.
import psycopg2
def redshift():
conn = psycopg2.connect(dbname='database_name', host='888888888888****.u.****.redshift.amazonaws.com', port='5439', user='username', password='********')
cur = conn.cursor();
cur.execute("truncate table example;")
//Begin your transaction
cur.execute("begin;")
cur.execute("copy example from 's3://examble-bucket/example.csv' credentials 'aws_access_key_id=ID;aws_secret_access_key=KEY/KEY/pL/KEY' csv;")
////Commit your transaction
cur.execute("commit;")
print("Copy executed fine!")
redshift();
There are even more ways to make Copy
faster in Menifest
option, so that Redshift
could load the data in parallel.
Hope this give you some idea to move.
回答2:
Any suggestions on, how to pass connection string in place of connection details in below conde : -
conn = psycopg2.connect(dbname ='' ,host='' ...)
i am looking to pass like this ..
conn = psycopg2.connect('Connection_String')
来源:https://stackoverflow.com/questions/53891593/python-write-dateframe-to-aws-redshift-using-psycopg2