What is the fastest way to output large DataFrame into a CSV file?

前端 未结 4 1423
北海茫月
北海茫月 2020-12-01 14:19

For python / pandas I find that df.to_csv(fname) works at a speed of ~1 mln rows per min. I can sometimes improve performance by a factor of 7 like this:

def         


        
4条回答
  •  生来不讨喜
    2020-12-01 14:40

    Your df_to_csv function is very nice, except it does a lot of assumptions and doesn't work for the general case.

    If it works for you, that's good, but be aware that it is not a general solution. CSV can contain commas, so what happens if there is this tuple to be written? ('a,b','c')

    The python csv module would quote that value so that no confusion arises, and would escape quotes if quotes are present in any of the values. Of course generating something that works in all cases is much slower. But I suppose you only have a bunch of numbers.

    You could try this and see if it is faster:

    #data is a tuple containing tuples
    
    for row in data:
        for col in xrange(len(row)):
            f.write('%d' % row[col])
            if col < len(row)-1:
                f.write(',')
        f.write('\n')
    

    I don't know if that would be faster. If not it's because too many system calls are done, so you might use StringIO instead of direct output and then dump it to a real file every once in a while.

提交回复
热议问题