Python Pandas - Using to_sql to write large data frames in chunks

后端 未结 2 368
刺人心
刺人心 2020-12-06 04:55

I\'m using Pandas\' to_sql function to write to MySQL, which is timing out due to large frame size (1M rows, 20 columns).

http://pandas.pydata.org/panda

2条回答
  •  旧巷少年郎
    2020-12-06 05:58

    There is beautiful idiomatic function chunks provided in answer to this question

    In your case you can use this function like this:

    def chunks(l, n):
    """ Yield successive n-sized chunks from l.
    """
        for i in xrange(0, len(l), n):
             yield l.iloc[i:i+n]
    
    def write_to_db(engine, frame, table_name, chunk_size):
        for idx, chunk in enumerate(chunks(frame, chunk_size)):
            if idx == 0:
                if_exists_param = 'replace':
            else:
                if_exists_param = 'append'
            chunk.to_sql(con=engine, name=table_name, if_exists=if_exists_param)
    

    Only drawback that it doesn't support slicing second index in iloc function.

提交回复
热议问题