NumPy version of “Exponential weighted moving average”, equivalent to pandas.ewm().mean()

后端 未结 12 751
一生所求
一生所求 2020-11-27 12:30

How do I get the exponential weighted moving average in NumPy just like the following in pandas?

import pandas as pd
import pandas_datareader as pdr
from dat         


        
12条回答
  •  再見小時候
    2020-11-27 12:56

    Given alpha and windowSize, here's an approach to simulate the corresponding behavior on NumPy -

    def numpy_ewm_alpha(a, alpha, windowSize):
        wghts = (1-alpha)**np.arange(windowSize)
        wghts /= wghts.sum()
        out = np.full(df.shape[0],np.nan)
        out[windowSize-1:] = np.convolve(a,wghts,'valid')
        return out
    

    Sample runs for verification -

    In [54]: alpha = 0.55
        ...: windowSize = 20
        ...: 
    
    In [55]: df = pd.DataFrame(np.random.randint(2,9,(100)))
    
    In [56]: out0 = df.ewm(alpha = alpha, min_periods=windowSize).mean().as_matrix().ravel()
        ...: out1 = numpy_ewm_alpha(df.values.ravel(), alpha = alpha, windowSize = windowSize)
        ...: print "Max. error : " + str(np.nanmax(np.abs(out0 - out1)))
        ...: 
    Max. error : 5.10531254605e-07
    
    In [57]: alpha = 0.75
        ...: windowSize = 30
        ...: 
    
    In [58]: out0 = df.ewm(alpha = alpha, min_periods=windowSize).mean().as_matrix().ravel()
        ...: out1 = numpy_ewm_alpha(df.values.ravel(), alpha = alpha, windowSize = windowSize)
        ...: print "Max. error : " + str(np.nanmax(np.abs(out0 - out1)))
    
    Max. error : 8.881784197e-16
    

    Runtime test on bigger dataset -

    In [61]: alpha = 0.55
        ...: windowSize = 20
        ...: 
    
    In [62]: df = pd.DataFrame(np.random.randint(2,9,(10000)))
    
    In [63]: %timeit df.ewm(alpha = alpha, min_periods=windowSize).mean()
    1000 loops, best of 3: 851 µs per loop
    
    In [64]: %timeit numpy_ewm_alpha(df.values.ravel(), alpha = alpha, windowSize = windowSize)
    1000 loops, best of 3: 204 µs per loop
    

    Further boost

    For further performance boost we could avoid the initialization with NaNs and instead use the array outputted from np.convolve, like so -

    def numpy_ewm_alpha_v2(a, alpha, windowSize):
        wghts = (1-alpha)**np.arange(windowSize)
        wghts /= wghts.sum()
        out = np.convolve(a,wghts)
        out[:windowSize-1] = np.nan
        return out[:a.size]  
    

    Timings -

    In [117]: alpha = 0.55
         ...: windowSize = 20
         ...: 
    
    In [118]: df = pd.DataFrame(np.random.randint(2,9,(10000)))
    
    In [119]: %timeit numpy_ewm_alpha(df.values.ravel(), alpha = alpha, windowSize = windowSize)
    1000 loops, best of 3: 204 µs per loop
    
    In [120]: %timeit numpy_ewm_alpha_v2(df.values.ravel(), alpha = alpha, windowSize = windowSize)
    10000 loops, best of 3: 195 µs per loop
    

提交回复
热议问题