I have created a sqlite database using pandas df.to_sql however accessing it seems considerably slower than just reading in the 500mb csv file.
I need to:
In Sqlite, with a normal rowid table, unless the primary key is a single INTEGER column (See ROWIDs and the INTEGER PRIMARY KEY in the documentation), it's equivalent to a UNIQUE index (Because the real PK of a normal table is the rowid).
Notes from the documentation for rowid tables:
The PRIMARY KEY of a rowid table (if there is one) is usually not the true primary key for the table, in the sense that it is not the unique key used by the underlying B-tree storage engine. The exception to this rule is when the rowid table declares an INTEGER PRIMARY KEY. In the exception, the INTEGER PRIMARY KEY becomes an alias for the rowid.
The true primary key for a rowid table (the value that is used as the key to look up rows in the underlying B-tree storage engine) is the rowid.
The PRIMARY KEY constraint for a rowid table (as long as it is not the true primary key or INTEGER PRIMARY KEY) is really the same thing as a UNIQUE constraint. Because it is not a true primary key, columns of the PRIMARY KEY are allowed to be NULL, in violation of all SQL standards.
So you can easily fake a primary key after creating the table with:
CREATE UNIQUE INDEX mytable_fake_pk ON mytable(pk_column)
Besides the NULL thing, you won't get the benefits of an INTEGER PRIMARY KEY if your column is supposed to hold integers, like taking up less space and auto-generating values on insert if left out, but it'll otherwise work for most purposes.