Loading 5 million rows into Pandas from MySQL

前端 未结 3 1304
栀梦
栀梦 2021-02-06 08:36

I have 5 million rows in a MySQL DB sitting over the (local) network (so quick connection, not on the internet).

The connection to the DB works fine, but if I try to do

3条回答
  •  轻奢々
    轻奢々 (楼主)
    2021-02-06 08:57

    query: Write your query.
    conn : Connect to your database
    chunksize: Extracts data in batches. Returns a generator.

    Try the below code to extract the data in chunks. Then use the function to convert the generator object to dataframe.

    df_chunks = pd.read_sql_query(query, conn, chunksize=50000)
    
    def chunks_to_df(gen):
        chunks = []
        for df in gen:
            chunks.append(df)
        return pd.concat(chunks).reset_index().drop('index', axis=1)
    
    df = chunks_to_df(df_chunks)
    

    This will help you reduce load on the database server and get all your data in batches and use it for your further analysis.

提交回复
热议问题