how to set the primary key when writing a pandas dataframe to a sqlite database table using df.to_sql

前端 未结 6 844
悲&欢浪女
悲&欢浪女 2020-12-09 05:30

I have created a sqlite database using pandas df.to_sql however accessing it seems considerably slower than just reading in the 500mb csv file.

I need to:

6条回答
  •  野趣味
    野趣味 (楼主)
    2020-12-09 06:00

    Building on Chris Guarino's answer, it is almost impossible to assign a Primary key to an already existing column using df.to_sql() method. Likewise in your 500mb csv file you cannot create an duplicate table with huge number of columns.

    However a small Workaround of affffding a new column as Primary key while creation of dataframe to SQL. It is possible to iterate over Pandas dataframe.columns function to create a new database and while the creation you can add a Primary key. With this duplicate table is not needed.

    i am adding a small Code snippet of it.

    import pandas as pd
    import sqlite3
    import sqlalchemy 
    from sqlalchemy import create_engine
    
    df= pd.read_excel(r'C:\XXX\XXX\XXXX\XXX.xlsx',sep=';')
    X1 = df1.iloc[0:,0:]
    dataset = X1.astype('float32')
    dataset['date'] = pd.date_range(start='1/1/2020', periods=len(dataset), freq='D')
    dataset=dataset.set_index('date')
    
    engine = create_engine('sqlite:///measurement.db')
    
    sqlite_connection = engine.connect()
    
    sqlite_table = "table1"
    sqlite_connection.execute("CREATE TABLE table1 (id INTEGER PRIMARY KEY AUTOINCREMENT,  date TIMESTAMP, " +
             ",".join(["%s REAL" % x for x in dataset.columns]) + ")" )
    dataset.to_sql(sqlite_table, sqlite_connection, if_exists='append')
    
    Output database table:
    [(0, 'id', 'INTEGER', 0, None, 1),
    (1, 'date', 'TIMESTAMP', 0, None, 0),
    (2, 'time_stamp', 'REAL', 0, None, 0),
    (3, 'column_1', 'REAL', 0, None, 0),
    (4, 'column_2', 'REAL', 0, None, 0)]
    

    This method works only if the dataframe has an index. Also to have the index as column in our table it should be explicitly defined while writing our query.

    Hope this helps for huge database creations.

提交回复
热议问题