How to write data frame to Postgres table without using SQLAlchemy engine?

前端 未结 1 1688
萌比男神i
萌比男神i 2021-01-28 02:36

I have a data frame that I want to write to a Postgres database. This functionality needs to be part of a Flask app.

For now, I\'m runn

相关标签:
1条回答
  • 2021-01-28 02:51

    You can use those connections and avoid SQLAlchemy. This is going to sound rather unintuitive, but it will be much faster than regular inserts (even if you were to drop the ORM and make a general query e.g. with executemany). Inserts are slow, even with raw queries, but you'll see that COPY is mentioned several times in How to speed up insertion performance in PostgreSQL. In this instance, my motivations for the approach below are:

    1. Use COPY instead of INSERT
    2. Don't trust Pandas to generate the correct SQL for this operation (although, as noted by Ilja Everilä, this approach actually got added to Pandas in V0.24)
    3. Don't write the data to disk to make an actual file object; keep it all in memory

    Suggested approach using cursor.copy_from():

    import csv
    import io
    import psycopg2
    
    df = "<your_df_here>"
    
    # drop all the columns you don't want in the insert data here
    
    # First take the headers
    headers = df.columns
    
    # Now get a nested list of values
    data = df.values.tolist()
    
    # Create an in-memory CSV file
    string_buffer = io.StringIO()
    csv_writer = csv.writer(string_buffer)
    csv_writer.writerows(data)
    
    # Reset the buffer back to the first line
    string_buffer.seek(0)
    
    # Open a connection to the db (which I think you already have available)
    with psycopg2.connect(dbname=current_app.config['POSTGRES_DB'], 
                          user=current_app.config['POSTGRES_USER'],
                          password=current_app.config['POSTGRES_PW'], 
                          host=current_app.config['POSTGRES_URL']) as conn:
        c = conn.cursor()
    
        # Now upload the data as though it was a file
        c.copy_from(string_buffer, 'the_table_name', sep=',', columns=headers)
        conn.commit()
    

    This should be orders of magnitude faster than actually doing inserts.

    0 讨论(0)
提交回复
热议问题