Python Pandas - Using to_sql to write large data frames in chunks

后端 未结 2 362
刺人心
刺人心 2020-12-06 04:55

I\'m using Pandas\' to_sql function to write to MySQL, which is timing out due to large frame size (1M rows, 20 columns).

http://pandas.pydata.org/panda

相关标签:
2条回答
  • 2020-12-06 05:57

    Update: this functionality has been merged in pandas master and will be released in 0.15 (probably end of september), thanks to @artemyk! See https://github.com/pydata/pandas/pull/8062

    So starting from 0.15, you can specify the chunksize argument and e.g. simply do:

    df.to_sql('table', engine, chunksize=20000)
    
    0 讨论(0)
  • 2020-12-06 05:58

    There is beautiful idiomatic function chunks provided in answer to this question

    In your case you can use this function like this:

    def chunks(l, n):
    """ Yield successive n-sized chunks from l.
    """
        for i in xrange(0, len(l), n):
             yield l.iloc[i:i+n]
    
    def write_to_db(engine, frame, table_name, chunk_size):
        for idx, chunk in enumerate(chunks(frame, chunk_size)):
            if idx == 0:
                if_exists_param = 'replace':
            else:
                if_exists_param = 'append'
            chunk.to_sql(con=engine, name=table_name, if_exists=if_exists_param)
    

    Only drawback that it doesn't support slicing second index in iloc function.

    0 讨论(0)
提交回复
热议问题