What's the best way to sum all values in a Pandas dataframe?

后端 未结 2 2021
心在旅途
心在旅途 2020-12-15 16:09

I figured out these two methods. Is there a better one?

>>> import pandas as pd
>>> df = pd.DataFrame({\'A\': [5, 6, 7], \'B\': [7, 8, 9]}         


        
相关标签:
2条回答
  • 2020-12-15 16:32

    Updated for Pandas 0.24+

    df.to_numpy().sum()
    

    Prior to Pandas 0.24+

    df.values
    

    Is the underlying numpy array

    df.values.sum()
    

    Is the numpy sum method and is faster

    0 讨论(0)
  • 2020-12-15 16:50

    Adding some numbers to support this:

    import numpy as np, pandas as pd
    import timeit
    df = pd.DataFrame(np.arange(int(1e6)).reshape(500000, 2), columns=list("ab"))
    
    def pandas_test():
        return df['a'].sum()
    
    def numpy_test():
        return df['a'].to_numpy().sum()
    
    timeit.timeit(numpy_test, number=1000)  # 0.5032469799989485
    timeit.timeit(pandas_test, number=1000)  # 0.6035906639990571
    

    So we get a 20% performance on my machine just for Series summations!

    0 讨论(0)
提交回复
热议问题