I have a dataframe in Python. Can I write this data to Redshift as a new table? I have successfully created a db connection to Redshift and am able to execute simple sql que
For the purpose of this conversation Postgres = RedShift You have two options:
Option 1:
From Pandas: http://pandas.pydata.org/pandas-docs/stable/io.html#io-sql
The pandas.io.sql module provides a collection of query wrappers to both facilitate data retrieval and to reduce dependency on DB-specific API. Database abstraction is provided by SQLAlchemy if installed. In addition you will need a driver library for your database. Examples of such drivers are psycopg2 for PostgreSQL or pymysql for MySQL.
Writing DataFrames
Assuming the following data is in a DataFrame data, we can insert it into the database using to_sql().
id Date Col_1 Col_2 Col_3
26 2012-10-18 X 25.7 True
42 2012-10-19 Y -12.4 False
63 2012-10-20 Z 5.73 True
In [437]: data.to_sql('data', engine)
With some databases, writing large DataFrames can result in errors due to packet size limitations being exceeded. This can be avoided by setting the chunksize parameter when calling to_sql. For example, the following writes data to the database in batches of 1000 rows at a time:
In [438]: data.to_sql('data_chunked', engine, chunksize=1000)
Option 2
Or you can simply do your own If you have a dataframe called data simply loop over it using iterrows:
for row in data.iterrows():
then add each row to your database. I would use copy instead of insert for each row, as it will be much faster.
http://initd.org/psycopg/docs/usage.html#using-copy-to-and-copy-from