Python Pandas - Using to_sql to write large data frames in chunks

后端未结

关注

 2  372

I\'m using Pandas\' to_sql function to write to MySQL, which is timing out due to large frame size (1M rows, 20 columns).

http://pandas.pydata.org/panda

相关标签:

2条回答

萌比男神i

2020-12-06 05:57
Update: this functionality has been merged in pandas master and will be released in 0.15 (probably end of september), thanks to @artemyk! See https://github.com/pydata/pandas/pull/8062

So starting from 0.15, you can specify the chunksize argument and e.g. simply do:
```
df.to_sql('table', engine, chunksize=20000)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

旧巷少年郎

2020-12-06 05:58

There is beautiful idiomatic function chunks provided in answer to this question

In your case you can use this function like this:

def chunks(l, n):
""" Yield successive n-sized chunks from l.
"""
    for i in xrange(0, len(l), n):
         yield l.iloc[i:i+n]

def write_to_db(engine, frame, table_name, chunk_size):
    for idx, chunk in enumerate(chunks(frame, chunk_size)):
        if idx == 0:
            if_exists_param = 'replace':
        else:
            if_exists_param = 'append'
        chunk.to_sql(con=engine, name=table_name, if_exists=if_exists_param)

Only drawback that it doesn't support slicing second index in iloc function.

0 讨论(0)