Bulk Insert A Pandas DataFrame Using SQLAlchemy

前端 未结 10 1819
死守一世寂寞
死守一世寂寞 2020-11-28 22:07

I have some rather large pandas DataFrames and I\'d like to use the new bulk SQL mappings to upload them to a Microsoft SQL Server via SQL Alchemy. The pandas.to_sql method,

10条回答
  •  谎友^
    谎友^ (楼主)
    2020-11-28 22:24

    I ran into a similar issue with pd.to_sql taking hours to upload data. The below code bulk inserted the same data in a few seconds.

    from sqlalchemy import create_engine
    import psycopg2 as pg
    #load python script that batch loads pandas df to sql
    import cStringIO
    
    address = 'postgresql://:@:/'
    engine = create_engine(address)
    connection = engine.raw_connection()
    cursor = connection.cursor()
    
    #df is the dataframe containing an index and the columns "Event" and "Day"
    #create Index column to use as primary key
    df.reset_index(inplace=True)
    df.rename(columns={'index':'Index'}, inplace =True)
    
    #create the table but first drop if it already exists
    command = '''DROP TABLE IF EXISTS localytics_app2;
    CREATE TABLE localytics_app2
    (
    "Index" serial primary key,
    "Event" text,
    "Day" timestamp without time zone,
    );'''
    cursor.execute(command)
    connection.commit()
    
    #stream the data using 'to_csv' and StringIO(); then use sql's 'copy_from' function
    output = cStringIO.StringIO()
    #ignore the index
    df.to_csv(output, sep='\t', header=False, index=False)
    #jump to start of stream
    output.seek(0)
    contents = output.getvalue()
    cur = connection.cursor()
    #null values become ''
    cur.copy_from(output, 'localytics_app2', null="")    
    connection.commit()
    cur.close()
    

提交回复
热议问题