I have created a sqlite database using pandas df.to_sql however accessing it seems considerably slower than just reading in the 500mb csv file.
I need to:
Building on Chris Guarino's answer, it is almost impossible to assign a Primary key to an already existing column using df.to_sql() method. Likewise in your 500mb csv file you cannot create an duplicate table with huge number of columns.
However a small Workaround of affffding a new column as Primary key while creation of dataframe to SQL. It is possible to iterate over Pandas dataframe.columns function to create a new database and while the creation you can add a Primary key. With this duplicate table is not needed.
i am adding a small Code snippet of it.
import pandas as pd
import sqlite3
import sqlalchemy
from sqlalchemy import create_engine
df= pd.read_excel(r'C:\XXX\XXX\XXXX\XXX.xlsx',sep=';')
X1 = df1.iloc[0:,0:]
dataset = X1.astype('float32')
dataset['date'] = pd.date_range(start='1/1/2020', periods=len(dataset), freq='D')
dataset=dataset.set_index('date')
engine = create_engine('sqlite:///measurement.db')
sqlite_connection = engine.connect()
sqlite_table = "table1"
sqlite_connection.execute("CREATE TABLE table1 (id INTEGER PRIMARY KEY AUTOINCREMENT, date TIMESTAMP, " +
",".join(["%s REAL" % x for x in dataset.columns]) + ")" )
dataset.to_sql(sqlite_table, sqlite_connection, if_exists='append')
Output database table:
[(0, 'id', 'INTEGER', 0, None, 1),
(1, 'date', 'TIMESTAMP', 0, None, 0),
(2, 'time_stamp', 'REAL', 0, None, 0),
(3, 'column_1', 'REAL', 0, None, 0),
(4, 'column_2', 'REAL', 0, None, 0)]
This method works only if the dataframe has an index. Also to have the index as column in our table it should be explicitly defined while writing our query.
Hope this helps for huge database creations.