how to set the primary key when writing a pandas dataframe to a sqlite database table using df.to_sql

前端未结

关注

 6  844

悲&欢浪女 2020-12-09 05:30

I have created a sqlite database using pandas df.to_sql however accessing it seems considerably slower than just reading in the 500mb csv file.

I need to:

6条回答

野趣味 (楼主)

2020-12-09 06:00

Building on Chris Guarino's answer, it is almost impossible to assign a Primary key to an already existing column using df.to_sql() method. Likewise in your 500mb csv file you cannot create an duplicate table with huge number of columns.

However a small Workaround of affffding a new column as Primary key while creation of dataframe to SQL. It is possible to iterate over Pandas dataframe.columns function to create a new database and while the creation you can add a Primary key. With this duplicate table is not needed.

i am adding a small Code snippet of it.

import pandas as pd
import sqlite3
import sqlalchemy 
from sqlalchemy import create_engine

df= pd.read_excel(r'C:\XXX\XXX\XXXX\XXX.xlsx',sep=';')
X1 = df1.iloc[0:,0:]
dataset = X1.astype('float32')
dataset['date'] = pd.date_range(start='1/1/2020', periods=len(dataset), freq='D')
dataset=dataset.set_index('date')

engine = create_engine('sqlite:///measurement.db')

sqlite_connection = engine.connect()

sqlite_table = "table1"
sqlite_connection.execute("CREATE TABLE table1 (id INTEGER PRIMARY KEY AUTOINCREMENT,  date TIMESTAMP, " +
         ",".join(["%s REAL" % x for x in dataset.columns]) + ")" )
dataset.to_sql(sqlite_table, sqlite_connection, if_exists='append')

Output database table:
[(0, 'id', 'INTEGER', 0, None, 1),
(1, 'date', 'TIMESTAMP', 0, None, 0),
(2, 'time_stamp', 'REAL', 0, None, 0),
(3, 'column_1', 'REAL', 0, None, 0),
(4, 'column_2', 'REAL', 0, None, 0)]

This method works only if the dataframe has an index. Also to have the index as column in our table it should be explicitly defined while writing our query.

Hope this helps for huge database creations.

0 讨论(0)

查看其它6个回答