NumPy version of “Exponential weighted moving average”, equivalent to pandas.ewm().mean()

后端 未结 12 772
一生所求
一生所求 2020-11-27 12:30

How do I get the exponential weighted moving average in NumPy just like the following in pandas?

import pandas as pd
import pandas_datareader as pdr
from dat         


        
12条回答
  •  臣服心动
    2020-11-27 12:40

    Fastest EWMA 23x pandas

    The question is strictly asking for a numpy solution, however, it seems that the OP was actually just after a pure numpy solution to speed up runtime.

    I solved a similar problem but instead looked towards numba.jit which massively speeds the compute time

    In [24]: a = np.random.random(10**7)
        ...: df = pd.Series(a)
    In [25]: %timeit numpy_ewma(a, 10)               # /a/42915307/4013571
        ...: %timeit df.ewm(span=10).mean()          # pandas
        ...: %timeit numpy_ewma_vectorized_v2(a, 10) # best w/o numba: /a/42926270/4013571
        ...: %timeit _ewma(a, 10)                    # fastest accurate (below)
        ...: %timeit _ewma_infinite_hist(a, 10)      # fastest overall (below)
    4.14 s ± 116 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    991 ms ± 52.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 
    396 ms ± 8.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    181 ms ± 1.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)   
    39.6 ms ± 979 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
    

    Scaling down to smaller arrays of a = np.random.random(100) (results in the same order)

    41.6 µs ± 491 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    945 ms ± 12 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    16 µs ± 93.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
    1.66 µs ± 13.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
    1.14 µs ± 5.57 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
    

    It is also worth pointing out that my functions below are identically aligned to the pandas (see the examples in docstr), whereas a few of the answers here take various different approximations. For example,

    In [57]: print(pd.DataFrame([1,2,3]).ewm(span=2).mean().values.ravel())
        ...: print(numpy_ewma_vectorized_v2(np.array([1,2,3]), 2))
        ...: print(numpy_ewma(np.array([1,2,3]), 2))
    [1.         1.75       2.61538462]
    [1.         1.66666667 2.55555556]
    [1.         1.18181818 1.51239669]
    

    The source code which I have documented for my own library

    import numpy as np
    from numba import jit
    from numba import float64
    from numba import int64
    
    
    @jit((float64[:], int64), nopython=True, nogil=True)
    def _ewma(arr_in, window):
        r"""Exponentialy weighted moving average specified by a decay ``window``
        to provide better adjustments for small windows via:
    
            y[t] = (x[t] + (1-a)*x[t-1] + (1-a)^2*x[t-2] + ... + (1-a)^n*x[t-n]) /
                   (1 + (1-a) + (1-a)^2 + ... + (1-a)^n).
    
        Parameters
        ----------
        arr_in : np.ndarray, float64
            A single dimenisional numpy array
        window : int64
            The decay window, or 'span'
    
        Returns
        -------
        np.ndarray
            The EWMA vector, same length / shape as ``arr_in``
    
        Examples
        --------
        >>> import pandas as pd
        >>> a = np.arange(5, dtype=float)
        >>> exp = pd.DataFrame(a).ewm(span=10, adjust=True).mean()
        >>> np.array_equal(_ewma_infinite_hist(a, 10), exp.values.ravel())
        True
        """
        n = arr_in.shape[0]
        ewma = np.empty(n, dtype=float64)
        alpha = 2 / float(window + 1)
        w = 1
        ewma_old = arr_in[0]
        ewma[0] = ewma_old
        for i in range(1, n):
            w += (1-alpha)**i
            ewma_old = ewma_old*(1-alpha) + arr_in[i]
            ewma[i] = ewma_old / w
        return ewma
    
    
    @jit((float64[:], int64), nopython=True, nogil=True)
    def _ewma_infinite_hist(arr_in, window):
        r"""Exponentialy weighted moving average specified by a decay ``window``
        assuming infinite history via the recursive form:
    
            (2) (i)  y[0] = x[0]; and
                (ii) y[t] = a*x[t] + (1-a)*y[t-1] for t>0.
    
        This method is less accurate that ``_ewma`` but
        much faster:
    
            In [1]: import numpy as np, bars
               ...: arr = np.random.random(100000)
               ...: %timeit bars._ewma(arr, 10)
               ...: %timeit bars._ewma_infinite_hist(arr, 10)
            3.74 ms ± 60.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
            262 µs ± 1.54 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
        Parameters
        ----------
        arr_in : np.ndarray, float64
            A single dimenisional numpy array
        window : int64
            The decay window, or 'span'
    
        Returns
        -------
        np.ndarray
            The EWMA vector, same length / shape as ``arr_in``
    
        Examples
        --------
        >>> import pandas as pd
        >>> a = np.arange(5, dtype=float)
        >>> exp = pd.DataFrame(a).ewm(span=10, adjust=False).mean()
        >>> np.array_equal(_ewma_infinite_hist(a, 10), exp.values.ravel())
        True
        """
        n = arr_in.shape[0]
        ewma = np.empty(n, dtype=float64)
        alpha = 2 / float(window + 1)
        ewma[0] = arr_in[0]
        for i in range(1, n):
            ewma[i] = arr_in[i] * alpha + ewma[i-1] * (1 - alpha)
        return ewma
    

提交回复
热议问题