How to write data to Redshift that is a result of a dataframe created in Python?

后端未结

关注

 6  852

谎友^ 2020-12-14 08:27

I have a dataframe in Python. Can I write this data to Redshift as a new table? I have successfully created a db connection to Redshift and am able to execute simple sql que

6条回答

感动是毒 (楼主)

2020-12-14 08:45
For the purpose of this conversation Postgres = RedShift You have two options:

Option 1:

From Pandas: http://pandas.pydata.org/pandas-docs/stable/io.html#io-sql

The pandas.io.sql module provides a collection of query wrappers to both facilitate data retrieval and to reduce dependency on DB-specific API. Database abstraction is provided by SQLAlchemy if installed. In addition you will need a driver library for your database. Examples of such drivers are psycopg2 for PostgreSQL or pymysql for MySQL.

Writing DataFrames

Assuming the following data is in a DataFrame data, we can insert it into the database using to_sql().
```
id  Date    Col_1   Col_2   Col_3
26  2012-10-18  X   25.7    True
42  2012-10-19  Y   -12.4   False
63  2012-10-20  Z   5.73    True

In [437]: data.to_sql('data', engine)
```
With some databases, writing large DataFrames can result in errors due to packet size limitations being exceeded. This can be avoided by setting the chunksize parameter when calling to_sql. For example, the following writes data to the database in batches of 1000 rows at a time:
```
In [438]: data.to_sql('data_chunked', engine, chunksize=1000)
```
Option 2

Or you can simply do your own If you have a dataframe called data simply loop over it using iterrows:
```
for row in data.iterrows():
```
then add each row to your database. I would use copy instead of insert for each row, as it will be much faster.

http://initd.org/psycopg/docs/usage.html#using-copy-to-and-copy-from
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...