What's the fastest way to pickle a pandas DataFrame?

别等时光非礼了梦想. 提交于 2019-12-06 19:01:07

问题


Which is better, using Pandas built-in method or pickle.dump?

The standard pickle method looks like this:

pickle.dump(my_dataframe, open('test_pickle.p', 'wb'))

The Pandas built-in method looks like this:

my_dataframe.to_pickle('test_pickle.p')

回答1:


Thanks to @qwwqwwq I discovered that pandas has a built-in to_pickle method for dataframes. I did a quick time test:

In [1]: %timeit pickle.dump(df, open('test_pickle.p', 'wb'))
10 loops, best of 3: 91.8 ms per loop

In [2]: %timeit df.to_pickle('testpickle.p')
10 loops, best of 3: 88 ms per loop

So it seems that the built-in is only narrowly better (to me, this is useful because it means it's probably not worth refactoring code to use the built-in) - hope this helps someone!




回答2:


Easy benchmark, right?

Not difference at all, in fact I expect that Pandas implements getstate so that calling pickle.dump(df) is actually the same as calling df.to_pickle().

If you search for example __getstate__ on the Pandas source code, you will find that it is implemented on several objects.



来源:https://stackoverflow.com/questions/28754658/whats-the-fastest-way-to-pickle-a-pandas-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!