More Pythonic/Pandaic approach to looping over a pandas Series

后端 未结 4 641
你的背包
你的背包 2021-01-02 17:56

This is most likely something very basic, but I can\'t figure it out. Suppose that I have a Series like this:

s1 = pd.Series([1, 1, 1, 2, 2, 2, 3, 3, 3, 4,         


        
4条回答
  •  失恋的感觉
    2021-01-02 18:50

    You could also use np.add.reduceat by specifying the slices to be reduced at every 3rd element and compute their running sum:

    >>> pd.Series(np.add.reduceat(s1.values, np.arange(0, s1.shape[0], 3)))
    0     3
    1     6
    2     9
    3    12
    dtype: int64
    

    Timing Constraints:

    arr = np.repeat(np.arange(10**5), 3)
    s = pd.Series(arr)
    s.shape
    (300000,)
    
    # @IanS soln
    %timeit s.rolling(3).sum()[2::3]        
    100 loops, best of 3: 15.6 ms per loop
    
    # @Divakar soln
    %timeit pd.Series(np.bincount(np.arange(s.size)//3, s))  
    100 loops, best of 3: 5.44 ms per loop
    
    # @Nikolas Rieble soln
    %timeit pd.Series(np.sum(np.array(s).reshape(len(s)/3,3), axis = 1))  
    100 loops, best of 3: 2.17 ms per loop
    
    # @Nikolas Rieble modified soln
    %timeit pd.Series(np.sum(np.array(s).reshape(-1, 3), axis=1))  
    100 loops, best of 3: 2.15 ms per loop
    
    # @Divakar modified soln
    %timeit pd.Series(s.values.reshape(-1,3).sum(1))
    1000 loops, best of 3: 1.62 ms per loop
    
    # Proposed solution in post
    %timeit pd.Series(np.add.reduceat(s.values, np.arange(0, s.shape[0], 3)))
    1000 loops, best of 3: 1.45 ms per loop
    

提交回复
热议问题