pd.rolling_mean becoming deprecated - alternatives for ndarrays

前端 未结 5 690
自闭症患者
自闭症患者 2020-12-16 14:09

EDIT: This question was asked in 2016 and similar questions have been posted on SO years later after the functionality was finally removed, e.g. module 'pandas' has

相关标签:
5条回答
  • 2020-12-16 14:17

    Looks like the new way is via methods on the DataFrame.rolling class (I guess you're meant to think of it sort of like a groupby): http://pandas.pydata.org/pandas-docs/version/0.18.0/whatsnew.html

    e.g.

    x.rolling(window=2).mean()
    
    0 讨论(0)
  • 2020-12-16 14:21

    EDIT -- Unfortunately, it looks like the new way is not nearly as fast:

    New version of Pandas:

    In [1]: x = np.random.uniform(size=100)
    
    In [2]: %timeit pd.rolling_mean(x, window=2)
    1000 loops, best of 3: 240 µs per loop
    
    In [3]: %timeit pd.Series(x).rolling(window=2).mean()
    1000 loops, best of 3: 226 µs per loop
    
    In [4]: pd.__version__
    Out[4]: '0.18.0'
    

    Old version:

    In [1]: x = np.random.uniform(size=100)
    
    In [2]: %timeit pd.rolling_mean(x,window=2)
    100000 loops, best of 3: 12.4 µs per loop
    
    In [3]: pd.__version__
    Out[3]: u'0.17.1'
    
    0 讨论(0)
  • 2020-12-16 14:31

    try this

    x.rolling(window=2, center=False).mean()
    
    0 讨论(0)
  • 2020-12-16 14:32

    I suggest scipy.ndimage.filters.uniform_filter1d like in my answer to the linked question. It is also way faster for large arrays:

    import numpy as np
    from scipy.ndimage.filters import uniform_filter1d
    N = 1000
    x = np.random.random(100000)
    
    %timeit pd.rolling_mean(x, window=N)
    __main__:257: FutureWarning: pd.rolling_mean is deprecated for ndarrays and will be removed in a future version
    The slowest run took 84.55 times longer than the fastest. This could mean that an intermediate result is being cached.
    1 loop, best of 3: 7.37 ms per loop
    
    %timeit uniform_filter1d(x, size=N)
    10000 loops, best of 3: 190 µs per loop
    
    0 讨论(0)
  • 2020-12-16 14:39

    If your dimensions are homogeneous, you could try to implement an n-dimensional form of the Summed Area Table used for bidimensional images:

    A summed area table is a data structure and algorithm for quickly and efficiently generating the sum of values in a rectangular subset of a grid.

    Then, in this order, you could:

    1. Create the summed area table ("integral") of your array;
    2. Iterate to get the (quite cheap) sum of a n-dimensional kernel at a given position;
    3. Divide by the size of the n-dimensional volume of the kernel.

    Unfortunately I cannot know if this is efficient or not, but the by the given premise, it should be.

    0 讨论(0)
提交回复
热议问题