How to write data to Redshift that is a result of a dataframe created in Python?

后端 未结 6 852
谎友^
谎友^ 2020-12-14 08:27

I have a dataframe in Python. Can I write this data to Redshift as a new table? I have successfully created a db connection to Redshift and am able to execute simple sql que

6条回答
  •  感动是毒
    2020-12-14 08:45

    For the purpose of this conversation Postgres = RedShift You have two options:

    Option 1:

    From Pandas: http://pandas.pydata.org/pandas-docs/stable/io.html#io-sql

    The pandas.io.sql module provides a collection of query wrappers to both facilitate data retrieval and to reduce dependency on DB-specific API. Database abstraction is provided by SQLAlchemy if installed. In addition you will need a driver library for your database. Examples of such drivers are psycopg2 for PostgreSQL or pymysql for MySQL.

    Writing DataFrames

    Assuming the following data is in a DataFrame data, we can insert it into the database using to_sql().

    id  Date    Col_1   Col_2   Col_3
    26  2012-10-18  X   25.7    True
    42  2012-10-19  Y   -12.4   False
    63  2012-10-20  Z   5.73    True
    
    In [437]: data.to_sql('data', engine)
    

    With some databases, writing large DataFrames can result in errors due to packet size limitations being exceeded. This can be avoided by setting the chunksize parameter when calling to_sql. For example, the following writes data to the database in batches of 1000 rows at a time:

    In [438]: data.to_sql('data_chunked', engine, chunksize=1000)
    

    Option 2

    Or you can simply do your own If you have a dataframe called data simply loop over it using iterrows:

    for row in data.iterrows():
    

    then add each row to your database. I would use copy instead of insert for each row, as it will be much faster.

    http://initd.org/psycopg/docs/usage.html#using-copy-to-and-copy-from

提交回复
热议问题