I have 5 million rows in a MySQL DB sitting over the (local) network (so quick connection, not on the internet).
The connection to the DB works fine, but if I try to do
query
: Write your query.
conn
: Connect to your database
chunksize
: Extracts data in batches. Returns a generator.
Try the below code to extract the data in chunks. Then use the function to convert the generator object to dataframe.
df_chunks = pd.read_sql_query(query, conn, chunksize=50000)
def chunks_to_df(gen):
chunks = []
for df in gen:
chunks.append(df)
return pd.concat(chunks).reset_index().drop('index', axis=1)
df = chunks_to_df(df_chunks)
This will help you reduce load on the database server and get all your data in batches and use it for your further analysis.