What's the best way to sum all values in a Pandas dataframe?

混江龙づ霸主 提交于 2020-06-10 02:12:26

问题


I figured out these two methods. Is there a better one?

>>> import pandas as pd
>>> df = pd.DataFrame({'A': [5, 6, 7], 'B': [7, 8, 9]})
>>> print df.sum().sum()
42
>>> print df.values.sum()
42

Just want to make sure I'm not missing something more obvious.


回答1:


Updated for Pandas 0.24+

df.to_numpy().sum()

Prior to Pandas 0.24+

df.values

Is the underlying numpy array

df.values.sum()

Is the numpy sum method and is faster




回答2:


Adding some numbers to support this:

import numpy as np, pandas as pd
import timeit
df = pd.DataFrame(np.arange(int(1e6)).reshape(500000, 2), columns=list("ab"))

def pandas_test():
    return df['a'].sum()

def numpy_test():
    return df['a'].to_numpy().sum()

timeit.timeit(numpy_test, number=1000)  # 0.5032469799989485
timeit.timeit(pandas_test, number=1000)  # 0.6035906639990571

So we get a 20% performance on my machine just for Series summations!



来源:https://stackoverflow.com/questions/38733477/whats-the-best-way-to-sum-all-values-in-a-pandas-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!